Skip to main content
Product Deep Dive

The Verification Pipeline That Changes Everything

Admin Admin
2/4/2026
How Claim-Level Verification Works: A Technical Deep Dive Into the PromptReports Verification Pipeline

 

When we tell people that PromptReports.ai verifies every factual claim in every report, the first question is usually: "How?"

 

It's a fair question. "Verification" has become a vague term in the AI industry. Some platforms mean "we included a link." Others mean "we searched the web." A few mean "a human skimmed it." None of these are what we mean.

 

This post is a technical walkthrough of our verification pipeline — what happens between the moment a report is written and the moment it reaches you with a Verification Score attached to every claim. We're publishing this because transparency about methodology is a core part of what makes verified intelligence trustworthy. If we can't explain how we verify, the verification itself doesn't mean much.

 

The Pipeline at a Glance

 

Our verification system — internally called SOAR-V (Structure, Organize, Analyze, Report, Verify) — consists of five modules that run sequentially on every generated report:

 

1. Claim Extraction Engine (CEE) — Finds every verifiable claim
2. Citation Resolution Service (CRS) — Retrieves the actual source content
3. Content Grounding Analyzer (CGA) — Verifies claims against sources in three stages
4. Verification Scoring Module (VSM) — Calculates composite scores
5. Researcher Escalation Queue (REQ) — Routes failures to human experts

 

Let's walk through each one.

 

Module 1: Claim Extraction Engine

 

A report is not a monolithic block of text. It's a collection of individual assertions — some factual, some analytical, some hedged, some definitional. The Claim Extraction Engine's job is to identify every atomic factual claim that can be independently verified.

 

Consider this paragraph from a hypothetical report:

 

"The observability market is projected to reach $64B by 2028, driven primarily by cloud-native adoption. Cribl, which was founded in 2017, has emerged as a key player in the data routing segment, processing over 2 petabytes of data daily for enterprise customers according to their latest earnings report."

 

Our CEE identifies four distinct claims here:

 

Claim | Type | Priority
"The observability market is projected to reach $64B by 2028" | Statistical | Critical
"Cloud-native adoption is the primary driver" | Causal | High
"Cribl was founded in 2017" | Temporal | High
"Cribl processes over 2PB of data daily" | Statistical | Critical

 

Notice what the CEE does not extract: hedged language ("has emerged as a key player" is analytical, not a verifiable fact), structural connectors, and meta-commentary. The engine is specifically tuned to find assertions that have a right/wrong answer traceable to a source.

 

We classify claims into seven types — statistical, attributive, causal, comparative, temporal, existential, and analytical — each with a priority level. Statistical and comparative claims are always marked Critical because they're the most commonly hallucinated and the most damaging when wrong. A report typically yields 20-30 extractable claims, of which 15-20 are Critical or High priority.

 

Module 2: Citation Resolution Service

 

Each extracted claim has an inline citation pointing to a source in the report's research corpus. The CRS resolves that citation to actual retrievable content.

 

This sounds simple, but it's where many "fact-checking" systems fail. A URL being valid doesn't mean the content is accessible, parseable, or relevant. The CRS handles:

 

HTML pages: Fetched and cleaned using readability algorithms to strip navigation, ads, and chrome, leaving only the article content
PDF documents: Full text extraction with layout-aware parsing (critical for academic papers where columns, tables, and figures complicate extraction)
Paywalled content: If the source was accessible during research but is now paywalled, we use cached versions from the research phase
Dead links: Flagged immediately as a reference-grounding failure

 

For each resolved source, the CRS also identifies the most relevant excerpt — the specific paragraph or section that the claim is most likely drawing from. This is done via embedding similarity: we compare the claim's semantic embedding against sections of the source text and surface the closest match.

 

The output is a resolved citation package: the claim, the full source text, and the most relevant excerpt. This package is what the CGA evaluates.

 

Module 3: Content Grounding Analyzer — The Heart of Verification

 

This is the module that does the actual verification work, and it's the most technically sophisticated component of the pipeline. The CGA runs three sequential stages on each claim-source pair, and a claim must pass all three to be considered verified.

 

Stage 1: Relevance (Threshold: ≥ 0.70)

 

The first check is whether the cited source is even about the same topic as the claim. This catches a common hallucination pattern where models cite real sources that have nothing to do with the assertion being made — grabbing a plausible-looking URL that's topically adjacent but not actually relevant.

 

This is a fast computational check using embedding similarity. We compare the vector embedding of the claim against the embedding of the source excerpt. If the cosine similarity falls below 0.70, the claim fails immediately. The source simply isn't talking about what the claim says it's talking about.

 

This stage catches approximately 8-12% of claims in a typical report — sources that were retrieved during research but misapplied during synthesis.

 

Stage 2: Support (Threshold: ≥ 3/5)

 

If the source passes relevance, we check whether it actually supports the claim being made. A source can be about the right topic but not actually back up the specific assertion.

 

This is an LLM-evaluated check. We present the claim and the source excerpt to a language model and ask: "On a scale of 1-5, how strongly does this source support this specific claim?"

 

The scale:
1 — Contradicts: The source says the opposite
2 — No support: The source discusses the topic but doesn't support this specific claim
3 — Weak support: The source provides indirect or partial support
4 — Strong support: The source clearly supports the claim
5 — Direct statement: The source states almost exactly what the claim asserts

 

Claims need a score of 3 or higher. This stage catches another 5-10% of claims — typically cases where the source discusses the right topic (passing relevance) but doesn't actually endorse the specific assertion the report makes.

 

Stage 3: Fidelity (Threshold: domain-specific, 0.80–0.92)

 

This is the most important and most nuanced check. Even if a source is relevant and generally supportive, the claim might still distort what the source actually says. Fidelity catches:

 

Exaggeration: Source says "grew by approximately 15-20%" → Claim says "grew by 23%"
Misattribution: Source attributes a finding to one researcher → Claim attributes it to another
False precision: Source gives a range → Claim picks a specific number
Temporal errors: Source reports Q2 data → Claim presents it as Q3
Context stripping: Source makes a claim with caveats → Claim presents it as absolute

 

Fidelity is scored 0 to 1, and the threshold varies by domain because different fields demand different levels of precision:

 

Domain | Fidelity Threshold | Why
Legal | 0.92 | Misquoting precedent has serious consequences
Healthcare | 0.90 | Clinical claims require near-exact accuracy
Financial | 0.88 | Regulatory implications for misrepresented data
Technology | 0.82 | Technical claims have more acceptable variation
General Business | 0.80 | Broader interpretive range acceptable

 

This stage catches the most dangerous errors — the content-grounding failures that HalluHard identified as the persistent problem even with web search. The source is real, the topic is right, the general direction is correct — but the specific claim doesn't faithfully represent what the source actually says.

 

Module 4: Verification Scoring

 

Claims that pass all three CGA stages receive a composite Verification Score using a weighted formula:

 

VS = (Source Quality × 0.25) + (Grounding Strength × 0.35) + (Fidelity Score × 0.25) + (Corroboration Bonus × 0.15)

 

Where:
Source Quality is the source's RSI (Reliability Source Index) score, evaluating publisher authority, recency, methodology, and corroboration
Grounding Strength is the normalized support score from CGA Stage 2
Fidelity Score is the CGA Stage 3 output
Corroboration Bonus rewards claims that are independently supported by multiple sources

 

The report-level Verification Score is a weighted average of all claim scores, where Critical claims count 3x and High claims count 2x. This ensures that the report's overall score is most heavily influenced by the claims that matter most.

 

Module 5: Researcher Escalation Queue

 

Not every failed claim is an error. Sometimes sources are temporarily unavailable. Sometimes a claim is in a rapidly changing domain where yesterday's data is already outdated. Sometimes the verification itself might be wrong.

 

Claims that fail CGA don't just disappear. They enter a decision tree:

 

Reference failure (source unavailable): Automatically triggers re-research — our specialist agents search for alternative sources that can verify the claim
Grounding failure (score > 0.5): Triggers re-research with a more targeted query
Grounding failure (score < 0.5): Escalated to human expert review
Contradiction detected: Always escalated to human review
Three re-research attempts failed: Escalated to human review

 

Human experts review the claim, the source, the CGA analysis, and make a final determination. Their decisions feed back into the system's training data, improving future verification accuracy.

 

What the User Sees

 

All of this happens behind the scenes. What you see as a user is elegantly simple: a report where every factual claim has a colored indicator — green for verified, amber for partially verified, red for flagged — and a clickable Verification Score.

 

Click any claim and you see: the cited source, the relevant excerpt, the three-stage breakdown, and the composite score. Full transparency into how every number was derived.

 

This is what we mean by "verified intelligence." Not "we included links." Not "we searched the web." Every claim extracted, every source retrieved, every assertion checked against what the source actually says, with domain-calibrated thresholds and human escalation for edge cases.

 

The verification pipeline adds processing time to report generation — typically 5-10 minutes on top of the research and synthesis phases. We think that's a worthwhile trade for knowing that every claim in your report is backed by evidence you can click through and see for yourself.

 

Want to see verification in action? [Generate a free report](/register) and click any claim to see the full verification breakdown.