Product Deep Dive
Inside the SOAR-V Pipeline, Part 3: The Content Grounding Analyzer
Admin Admin
2/6/2026
Inside the SOAR-V Pipeline, Part 3: The Content Grounding Analyzer — Three Stages of Truth
This is Part 3 of our series on the SOAR-V verification pipeline. The Claim Extraction Engine finds every verifiable assertion. The Citation Resolution Service retrieves the actual source content. Now we arrive at the module where verification actually happens: the Content Grounding Analyzer, or CGA.
If the CEE is the map and the CRS is the bridge, the CGA is the judge. It takes a factual claim from the report and the source content that claim cites, and determines — with quantified confidence — whether the source actually supports what the claim asserts.
This is the module that catches the failures other systems miss entirely. This is where we detect that a source is real, the citation is valid, the URL works — but the claim still doesn't accurately represent what the source says.
What Is the Content Grounding Analyzer?
The CGA is a three-stage verification engine that evaluates each claim-source pair through progressively deeper analysis. A claim must pass all three stages to be considered verified. Each stage catches a different type of content-grounding failure, and each stage is more computationally expensive than the previous one — so the pipeline is designed as a funnel that filters out obvious failures early before investing in deeper analysis.
The three stages, in order:
1. Relevance — Is the source even about the same topic as the claim?
2. Support — Does the source actually back up this specific assertion?
3. Fidelity — Does the claim accurately represent what the source says?
A claim that fails at any stage is flagged. A claim that passes all three receives a composite Verification Score. Let's walk through each stage.
Stage 1: Relevance — "Is the source about the right thing?"
The first check is the fastest and most fundamental: is the cited source actually talking about the same subject as the claim?
This catches a surprisingly common AI failure pattern. During the research and synthesis process, AI models sometimes misapply sources — citing a document that discusses a related but different topic. A report claiming "the cybersecurity market grew 18% in 2025" might cite a source that discusses IT spending broadly, or cybersecurity workforce growth, or cybersecurity in a specific vertical — adjacent topics, but not the same claim.
The relevance check uses embedding similarity. We generate vector embeddings for both the claim text and the source excerpt (identified by the CRS), then calculate the cosine similarity between them. If the similarity falls below the threshold, the source isn't sufficiently relevant to the claim.
Threshold: ≥ 0.70 cosine similarity
This is a computational check — no language model evaluation needed. It runs in milliseconds per claim and catches approximately 8-12% of claims in a typical report. These are the easiest failures to detect and the fastest to filter out.
What it catches:
• Sources about a related but different market segment
• Sources discussing the right company but a different metric
• Sources about the right topic but a different time period
• Sources that were relevant to the broader research but misapplied to a specific claim
What it doesn't catch: A source can be highly relevant (same topic, same company, same time period) but still not support the specific claim or accurately be represented by it. That's what Stages 2 and 3 handle.
Stage 2: Support — "Does the source actually back this up?"
A source passing relevance means it's about the right topic. Stage 2 asks a harder question: does the source content actually support the specific assertion the claim makes?
This is an LLM-evaluated check. We present the claim text and the source excerpt to a language model with a specific evaluation prompt:
"Given this source excerpt, how strongly does it support this specific claim? Evaluate only whether the source provides evidence for the claim, not whether the claim is true in general."
The response is scored on a five-level scale:
Score | Meaning | Example
1 — Contradicts | The source says the opposite of the claim | Claim says "grew 23%." Source says "declined 5%."
2 — No support | Source discusses the topic but doesn't support this claim | Claim says "grew 23%." Source discusses the company but never mentions growth rates.
3 — Weak support | Source provides indirect or partial support | Claim says "grew 23%." Source says "revenue increased significantly" without a specific number.
4 — Strong support | Source clearly supports the claim | Claim says "grew 23%." Source says "year-over-year revenue growth was 23.1%."
5 — Direct statement | Source states essentially the same thing | Claim says "grew 23%." Source says "grew 23% year-over-year."
Threshold: ≥ 3 (at minimum weak support)
Claims scoring 1 or 2 fail immediately. A score of 1 (contradiction) is particularly serious — the report is asserting something the source directly refutes. This triggers an immediate flag and usually leads to either a claim correction or an escalation to human review.
This stage catches another 5-10% of claims: cases where the source is about the right topic (passes relevance) but simply doesn't contain evidence for the specific assertion. The most common pattern is overgeneralization — the report makes a broad claim and cites a source that supports only a narrow slice of that claim.
Stage 3: Fidelity — "Does the claim accurately represent the source?"
This is the deepest, most nuanced, and most important stage. A claim can be relevant and supported but still distort what the source actually says. Fidelity checks whether the claim faithfully represents the source content — without exaggeration, misattribution, false precision, or decontextualization.
This is where content-grounding failures — the persistent problem identified by HalluHard — are caught.
The fidelity evaluation uses a detailed LLM assessment that checks for six specific distortion patterns:
Exaggeration: The source says one thing; the claim inflates it. "Revenue grew approximately 15-20%" becomes "Revenue surged over 20%." The direction is right, but the magnitude and certainty are overstated.
Misattribution: The source attributes a finding to one entity; the claim attributes it to another. A finding from an industry group's survey gets attributed to a specific analyst firm. A company's self-reported data gets presented as an independent assessment.
False precision: The source provides a range or approximation; the claim presents a specific number. "Between 30-40% of enterprises" becomes "35% of enterprises." The claim implies a precision the source doesn't support.
Temporal misrepresentation: The source discusses one time period; the claim applies it to another. Q2 data gets presented as full-year data. A 2024 statistic gets cited as current in a 2026 context without noting its age.
Context stripping: The source makes a claim with important caveats, conditions, or limitations; the report presents the claim as absolute. "In the US enterprise segment, adoption reached 40%" becomes "40% adoption." The geographic and segment qualifiers disappear, making the claim appear more universal than the source supports.
Selective emphasis: The source presents a balanced or mixed picture; the claim emphasizes only the supportive data. An analyst report that says "growth was strong in cloud but flat in on-premises" gets cited to support a claim about "strong growth" without the qualifier.
The fidelity score ranges from 0 to 1, where 1 represents perfect accuracy and 0 represents complete distortion.
Domain-Specific Thresholds
Not all domains require the same level of fidelity. A slight rounding in a general business report is less consequential than a slight rounding in a medical report that might influence treatment decisions. Our thresholds reflect this:
Domain | Fidelity Threshold | Rationale
Legal | 0.92 | Misquoting precedent or statute language can invalidate legal analysis
Healthcare | 0.90 | Clinical claims directly affect patient safety decisions
Financial | 0.88 | Regulatory implications for misrepresented financial data
Technology | 0.82 | Technical claims have more acceptable variation in phrasing
General Business | 0.80 | Broader interpretive range is acceptable for market analysis
These thresholds were calibrated using feedback from domain experts and are continuously refined through our Recursive Self-Improvement Engine. When human experts in the Researcher Escalation Queue override a CGA decision — either catching something the CGA missed or approving something the CGA rejected — those decisions adjust the thresholds over time.
What Users See
All of this complexity is presented to users as an elegantly simple interface. When you click any claim in a verified report, you see:
The claim text — exactly what the report asserts.
The source — the title, author, and URL of the cited source.
The source excerpt — the specific passage the claim draws from, displayed as a blockquote.
Three verification bars:
• Relevance: 0.89 (green bar filled to 89%)
• Support: 4/5 (green bar filled to 80%)
• Fidelity: 0.93 (green bar filled to 93%)
The composite Verification Score with the formula breakdown.
For a claim that fails, the bars turn amber or red, and a note explains what went wrong: "Fidelity failure — claim states '23% growth' but source reports 'approximately 15-20% growth.' The claim overstates the source's figure."
This transparency is the point. Users don't have to trust us when we say a claim is verified. They can see the source text, read it themselves, and evaluate whether the verification result makes sense.
How This Exceeds Expectations
Catches what humans miss. Even careful human reviewers tend to catch obvious fabrications (wrong company name, impossible statistic) but miss subtle distortions (slight exaggeration of a growth rate, context-stripped caveat). The CGA's systematic check for all six distortion patterns catches failures that humans routinely overlook — especially in long reports where attention fades.
Domain-calibrated rigor. Users in regulated industries (healthcare, finance, legal) need tighter accuracy standards than users in general business research. The CGA automatically applies domain-appropriate thresholds without users having to configure anything. A healthcare report is held to a higher standard than a market overview because the consequences of error are higher.
Contradiction detection. A Score 1 (Contradicts) in the Support stage is one of the most valuable findings the CGA produces. It means the report is asserting something that its own cited source directly refutes. This is a dangerous failure mode that's nearly impossible to catch by scanning a report manually — you'd have to read every cited source and compare it against every claim. The CGA does this automatically.
Quantified confidence. Instead of a binary "verified/not verified," users get a continuous score that reflects nuance. A claim scoring 0.94 is very well supported. A claim scoring 0.72 is technically above threshold but weaker — users can make their own judgment about how much weight to give it. This continuous scoring is more honest and more useful than a simple pass/fail.
Real-World Use Cases
Use Case 1: Board Presentation Fact-Check. A VP of Strategy prepares a board deck with market data from a PromptReports analysis. The CGA catches that one slide claims "market growing at 18% CAGR" while the cited analyst report says "15-18% depending on segment definition." Fidelity stage flags the false precision. The VP corrects the slide to "15-18% CAGR" before the board meeting — avoiding a credibility-damaging question from a detail-oriented board member.
Use Case 2: Legal Research Verification. A legal team uses a report on emerging AI regulation. The CGA's legal threshold (0.92 fidelity) catches a subtle misrepresentation: the report claims a proposed regulation "requires annual audits" while the actual draft legislation says "may require periodic assessments." The difference between "requires" and "may require" and between "annual" and "periodic" matters enormously in legal analysis. Standard verification would miss this; the CGA's legal-calibrated fidelity check catches it.
Use Case 3: Competitive Pricing Analysis. A product team compares pricing against competitors. The CGA detects a context-stripping failure: the report claims "Competitor X charges $15/user/month" but the source page shows that price is for annual billing only, with monthly billing at $20/user/month. The claim is technically sourced but strips the billing-period context that makes the comparison misleading. The Support stage catches this as a Score 3 (weak support) because the source supports the number but not the implied comparison.
Use Case 4: Healthcare Market Report. A healthcare consulting firm requests a report on telehealth adoption trends. The CGA catches an exaggeration: the report claims "telehealth adoption doubled during 2025" while the cited study reports "telehealth utilization increased 68% year-over-year." Both describe strong growth, but "doubled" overstates the source's finding. The healthcare fidelity threshold of 0.90 catches this distortion that would pass in a general business context.
Use Case 5: Investment Research Note. An analyst writes a research note drawing from a PromptReports deliverable. The CGA catches a misattribution: the report attributes a market forecast to "IDC" when the source is actually from a different research firm that IDC later cited. This matters because the authority of the forecast depends on who originated it. The Support stage flags this as a Score 2 (no support from the attributed source) and the claim gets re-researched with a correct attribution.
The Heart of the Pipeline
The CGA is the module that makes PromptReports.ai fundamentally different from every other AI research tool. Other platforms generate content and include links. We generate content, extract every claim, retrieve every source, and check — through three independent analytical stages — whether each claim faithfully represents its evidence.
In Part 4, we'll examine the Verification Scoring Module — the system that takes all of the CGA's individual claim evaluations and produces the composite Verification Score that appears on every report.
Three stages. Six distortion checks. Every claim. [See it for yourself →](/register)