Evidence Engine | Sanjana Kanchibotla

The Business Problem

Product teams waste billions building features nobody uses. Industry data shows only 6.4% of features drive 80% of usage (Pendo, 2024). Companies collectively spend $29.5 billion annually on features customers rarely touch.

The root cause isn't lack of data. It's systematic bias in how product managers interpret that data. Teams use frameworks like RICE (Reach, Impact, Confidence, Effort) that look rigorous but actually amplify bias rather than reduce it.

HiPPO bias: Executive opinions override user research
Confirmation bias: Teams seek evidence supporting predetermined decisions
Anchoring bias: Initial estimates distort all subsequent judgments

The result: Only 8% of high-confidence product bets succeed. Frameworks become post-hoc rationalization tools instead of decision aids.

The Research: Validating the Problem

I conducted user research with 4 product managers and stakeholders to understand how prioritization actually happens in practice.

Research Methodology

Semi-structured interviews with PMs at startups, growth companies, and financial services
Survey on framework usage and bias recognition
Analysis of 50+ academic papers on cognitive bias
Review of industry data from Pendo, Microsoft, Amazon

All four participants confirmed experiencing bias in prioritization. Three explicitly described frameworks as abandoned or used only for stakeholder communication, not actual decision-making.

Key Findings

"Frameworks break down due to stakeholder requirements" - Wesley, B2C PM
"I adjust scores to make things work politically" - Adnan, Startup Founder and PM with 10+ years experience
"Data is the biggest challenge. You largely go with gut/intuition" - Wesley
"Relationships determine outcomes" - Jorge, Stakeholder at PC Financial

The research revealed two distinct problems: PMs need better evidence gathering for their own decisions, and they need better ways to defend those decisions to stakeholders. Current frameworks solve neither.

The Solution: Evidence Engine

I built an AI tool that transforms unstructured research into evidence-based decisions. Instead of asking PMs to score features numerically, it guides them through structured evidence gathering and generates transparent reasoning.

How it works:

Extract: Parses interviews, feedback, and analytics into 6 evidence types (user quotes, behavioral observations, support tickets, analytics, stakeholder input, competitor intel)
Analyze: Tests hypotheses by actively searching for counter-evidence, not just confirming evidence
Synthesize: Generates reasoning traces showing exactly how conclusions were reached, with evidence counts and quality assessment

The key difference: traditional frameworks ask "what's your Impact score?" This tool asks "what happens if you don't build this?" and "what evidence supports that claim?"

Technical Approach

Built with Python, Google Gemini API, and Streamlit. The architecture separates evidence extraction, intent classification, and reasoning generation into modular components.

Three-stage workflow processes unstructured inputs into structured, defensible recommendations with complete reasoning transparency.

Key features:

Counter-evidence surfacing: Actively searches for data contradicting hypotheses
Quality over quantity: Prioritizes strong evidence (user interviews, analytics) over weak evidence (stakeholder opinions)
Transparent reasoning: Shows exactly which evidence led to which conclusions
Conversational interface: Natural language queries instead of scoring spreadsheets

I chose Google Gemini over other LLMs because it offers free API access (making the tool accessible to small teams) while maintaining strong reasoning capabilities for evidence synthesis.

How It Reduces Bias

Structural Constraints, Not Awareness Training

Research shows bias awareness doesn't prevent biased decisions. Anchoring bias persists even after debiasing training (Cohen's d = 1.19 reduces to 0.72, still a large effect).

Evidence Engine addresses this through process design:

Each bias type is addressed through specific mechanisms built into the evidence gathering process.

Confirmation bias: Forced counter-evidence search. System asks "what evidence contradicts this hypothesis?"
HiPPO bias: Evidence classification shows when stakeholder input dominates user data
Anchoring bias: Historical calibration compares current estimates to past accuracy
Recency bias: Temporal weighting shows how much recent vs. historical data drives conclusions

Transparency as Trust Mechanism

User research revealed PMs won't trust a "black box" AI. Both surveyed PMs required transparency and override ability. The tool addresses this by showing complete reasoning traces with evidence counts and allowing PMs to adjust or reject conclusions.

Business Impact for Financial Services

Evidence Engine addresses challenges specific to banking and financial services product teams:

For Product Strategy:

Regulatory compliance: Complete evidence trails for feature prioritization decisions support audit requirements
Risk management: Counter-evidence surfacing identifies potential issues before launch
Stakeholder alignment: Generated reasoning helps justify decisions to risk, compliance, and executive teams

For Decision Quality:

Reduce feature waste: Industry data shows 80% of features are rarely used. Better evidence gathering reduces this waste
Improve prediction accuracy: Only 8% of high-confidence bets succeed currently. Bias reduction improves this rate
Enable learning: Outcome tracking creates feedback loops missing in current processes

For Team Efficiency:

Fast evidence synthesis: Designed to work in under 15 minutes per feature (research showed speed is critical for adoption)
Reduce rework: Better initial decisions mean fewer pivots and feature deprecations
Defensible recommendations: Auto-generated reasoning reduces time spent justifying decisions

What I Learned

From the Research

Frameworks fail because they ask PMs to rate things numerically without structure. "What's your Impact score (0.25 to 3)?" invites bias. "What happens if you don't build this?" forces concrete thinking.

PMs don't want automation. They want support. As one PM said: "I want it to be more of a support tool, because I think actually providing the answers, that's like my job." Tools that try to replace judgment will fail.

Storytelling matters as much as scoring. The best data doesn't change decisions if you can't convince stakeholders. Tools must generate persuasive narratives, not just accurate numbers.

From the Implementation

LLMs are good at evidence synthesis but need structured prompts. Separating extraction, analysis, and synthesis into distinct phases with purpose-built prompts produces better results than single-shot generation.

Transparency is non-negotiable. PMs won't trust AI they can't inspect. Showing reasoning traces with evidence counts and quality assessments builds trust more than claiming high accuracy.

Next Steps

User testing: Validate with 10+ PMs to measure actual usage patterns and adoption barriers
Outcome tracking: Add post-launch measurement to compare predictions vs. actuals
Integration: Build Slack/Teams bot and browser extensions to meet PMs in their workflow
Calibration: Add company-specific historical data to improve prediction accuracy over time