Skip to main content
Product Deep Dive

Inside the SOAR-V Pipeline, Part 4: The Verification Scoring Module

Admin Admin
2/6/2026
Inside the SOAR-V Pipeline, Part 4: The Verification Scoring Module — From Raw Data to Trust You Can Measure

 

This is Part 4 of our series on the SOAR-V verification pipeline. The Claim Extraction Engine finds every verifiable claim. The Citation Resolution Service retrieves source content. The Content Grounding Analyzer evaluates each claim through three stages. Now we need to make sense of all that data.

 

A deep research report might contain 25 claims, each with three CGA stage scores, a source quality rating, and a corroboration count. That's over 100 individual data points. A user looking at the report doesn't want 100 data points. They want one clear answer: Can I trust this report?

 

That's the Verification Scoring Module's job.

 

What Is the Verification Scoring Module?

 

The VSM takes the raw output from the CGA — individual stage scores for every claim — and transforms it into two things users actually need: a per-claim Verification Score and an overall report-level Verification Score.

 

The per-claim score tells you how well an individual assertion is supported by evidence. The report-level score tells you how trustworthy the report is as a whole. Both are expressed as numbers between 0 and 1, and both are designed to be meaningful, comparable, and transparent.

 

This isn't a black-box confidence number. Every score can be decomposed into its components. Users can see exactly which factors pulled a score up or dragged it down. The VSM is an open formula, not a hidden algorithm.

 

The Claim-Level Formula

 

Each claim that passes all three CGA stages receives a composite Verification Score calculated from four weighted components:

 

VS = (Source Quality × 0.25) + (Grounding Strength × 0.35) + (Fidelity Score × 0.25) + (Corroboration Bonus × 0.15)

 

Let's break each component down.

 

Source Quality (Weight: 25%)

 

Not all sources are created equal. A statistic from a peer-reviewed journal carries more weight than one from a company blog post. A regulatory filing is more authoritative than a news article summarizing that filing.

 

Source Quality is the source's RSI (Reliability Source Index) composite score, which evaluates five dimensions:

 

Authority (25%): Publisher reputation, author credentials, peer review status. A Nature publication scores higher than a Medium blog post.
Recency (20%): How current the source is relative to the claim's domain. Technology sources decay faster than legal sources — a 2023 tech benchmark is dated; a 2023 Supreme Court ruling is still current.
Methodology (20%): For research-based sources, the quality of research design, sample size, and statistical rigor. An n=10,000 survey outscores an n=50 survey.
Corroboration (20%): How many other sources in the research corpus support similar findings. A data point confirmed by three independent sources scores higher than one found in a single report.
Relevance (15%): Semantic similarity between the source's overall content and the research question. A source specifically about the topic at hand outscores one that mentions it tangentially.

 

Source Quality contributes 25% of the claim score. This means that even a perfectly grounded claim from a low-quality source gets a lower score than the same claim from a high-quality source — reflecting the real-world principle that the authority of your evidence matters, not just its existence.

 

Grounding Strength (Weight: 35%)

 

This is the highest-weighted component, and it comes directly from the CGA's Stage 2 (Support) evaluation, normalized to a 0-1 scale.

 

A Support score of 5/5 (direct statement) normalizes to 1.0. A score of 3/5 (weak support) normalizes to 0.6. This captures the difference between a source that explicitly states what the claim asserts and a source that only indirectly supports it.

 

Grounding Strength gets the highest weight (35%) because it answers the most important question: Does the evidence actually support the assertion? A claim can be from an excellent source with high fidelity, but if the source only weakly supports the specific claim, the overall confidence should be lower.

 

Fidelity Score (Weight: 25%)

 

This is the CGA's Stage 3 output — the 0-to-1 score reflecting how accurately the claim represents the source content, after checking for exaggeration, misattribution, false precision, temporal errors, context stripping, and selective emphasis.

 

A fidelity score of 0.95 means the claim is an almost-perfect representation of the source. A score of 0.82 means there's some interpretive drift but the claim is broadly accurate. Below the domain threshold (0.80-0.92 depending on domain), the claim fails verification entirely and doesn't receive a VS at all.

 

Corroboration Bonus (Weight: 15%)

 

Independent corroboration is one of the strongest signals of claim reliability. If three different sources from three different publishers all support the same assertion, the probability that all three are wrong is much lower than if the claim rests on a single source.

 

The Corroboration Bonus is calculated as: min(1.0, independent_supporting_sources × 0.2). In plain language: each additional independent source that supports the claim adds 0.2 to the bonus, up to a maximum of 1.0 (five independent sources). A claim with zero additional corroboration gets a bonus of 0.0. A claim backed by five sources gets the full 1.0.

 

"Independent" means from different publishers or research groups. Two articles from the same news outlet quoting the same press release count as one source, not two.

 

Putting It Together: A Worked Example

 

Here's a concrete example. Consider the claim: "The global observability market is projected to reach $64 billion by 2028."

 

Component | Score | Calculation
Source Quality | 0.85 | Cited from a Gartner report (high authority, recent, strong methodology)
Grounding Strength | 0.80 | CGA Support: 4/5 (strong support, normalized to 0.80)
Fidelity Score | 0.92 | CGA Fidelity: 0.92 (claim says "$64B" and source says "$64.2B" — minor rounding, high fidelity)
Corroboration Bonus | 0.40 | Two additional independent sources (IDC and a peer-reviewed paper) confirm similar projections

 

VS = (0.85 × 0.25) + (0.80 × 0.35) + (0.92 × 0.25) + (0.40 × 0.15)
VS = 0.2125 + 0.28 + 0.23 + 0.06 = 0.7825

 

Verification Score: 0.78

 

This claim scores well but not perfectly. The main factor pulling it down is the Corroboration Bonus — only two additional sources, not the ideal five. If a third independent source confirmed the projection, the score would climb to 0.82.

 

Users can see this breakdown and understand exactly why the score is what it is. There's no mystery.

 

The Report-Level Score

 

Individual claim scores tell you about individual assertions. The report-level Verification Score tells you about the report as a whole.

 

The report score is a weighted average of all claim scores, where the weight depends on claim priority:

 

Critical claims (statistical, comparative): Weight = 3x
High claims (attributive, causal, temporal): Weight = 2x
Medium claims (existential): Weight = 1x
Low claims (analytical borderline): Weight = 0.5x

 

This weighting reflects a simple principle: a report with 20 verified existential claims and 3 fabricated statistical claims is worse than a report with 20 verified statistical claims and 3 unverified existential claims. The claims that are most likely to be wrong and most damaging when wrong have the biggest influence on the overall score.

 

Status Mapping

 

The report score maps to a human-readable status:

 

Score Range | Status | Badge Color | Meaning
≥ 0.90 | Verified | Green | Strong evidence backing across all claims
0.75 – 0.89 | Mostly Verified | Yellow-green | Good overall with some weaker claims
0.60 – 0.74 | Partially Verified | Amber | Mixed — significant claims need attention
< 0.60 | Low Confidence | Red | Substantial verification failures

 

Reports scoring below 0.60 are not delivered to users in their current form. They're either sent back through the re-research and revision loop or flagged for human review through the Researcher Escalation Queue.

 

Why This Matters for Users

 

Meaningful Comparability

 

Because every report uses the same formula, scores are comparable across reports, domains, and time periods. A VP of Strategy can look at two competing reports and say "this one has a VS of 0.91 and this one has a VS of 0.78 — the first one is better supported." This kind of apples-to-apples quality comparison is impossible with unverified AI output.

 

Informed Risk Assessment

 

Different users have different risk tolerances. A general market overview for internal brainstorming can tolerate a VS of 0.75. A report supporting a regulatory filing needs 0.90+. A due diligence report for a $100M acquisition needs the highest scores possible. By making the score transparent and decomposable, the VSM lets users calibrate their trust to their use case.

 

Quality Tracking Over Time

 

Because the VSM produces a consistent numerical score, organizations can track quality trends. "Our average Verification Score on technology reports increased from 0.82 to 0.89 over the past quarter" is a meaningful metric. It demonstrates that the platform's Recursive Self-Improvement Engine is working and that report quality is genuinely improving.

 

Honest Uncertainty Communication

 

The VSM doesn't pretend that everything is perfectly verified. A score of 0.78 is honest about the fact that some claims have weaker support than others. The breakdown shows exactly where the weaknesses are. This honesty is actually trust-building — users learn to rely on reports that accurately communicate their own limitations rather than reports that present everything with false confidence.

 

Real-World Use Cases

 

Use Case 1: Report Quality SLA for Enterprise Customers. An enterprise customer requires that all intelligence reports meet a minimum Verification Score of 0.85. The VSM makes this SLA measurable and enforceable. Reports that don't meet threshold go through additional research and verification cycles before delivery. The customer gets quantified quality assurance — not a vague promise of accuracy, but a specific numerical commitment backed by a transparent methodology.

 

Use Case 2: Marketplace Trust Signals. On the PromptReports marketplace, reports from independent researchers display their Verification Score prominently. Buyers can filter by minimum VS, and reports with higher scores command higher prices. The VSM creates a quality signal that aligns incentives: researchers are motivated to produce well-sourced, verifiable reports because better-verified reports sell at premium prices.

 

Use Case 3: Audit Trail for Compliance. A financial services firm uses PromptReports for regulatory research. The VSM creates a complete audit trail: every claim, every score, every component breakdown, timestamped and stored. When a regulator asks "how did you arrive at this conclusion about market structure?" the firm can produce the complete verification chain from claim to source to score.

 

Use Case 4: Research Quality Benchmarking. A consulting firm uses PromptReports across multiple client engagements. The VSM allows the firm to benchmark research quality by domain, time period, and research depth. "Our healthcare reports average 0.88 VS while our technology reports average 0.91" provides actionable insight: healthcare research strategies might need refinement, or healthcare domain thresholds might be appropriately more stringent.

 

Use Case 5: Source Authority Calibration. A technology analyst notices that reports citing a particular industry source consistently receive lower Source Quality scores. The VSM's component breakdown reveals that the source scores well on recency and relevance but poorly on methodology and corroboration. This helps the analyst understand why certain sources produce weaker verification results and adjust their source preferences for future research requests.

 

The Score Is the Promise

 

The Verification Score is more than a number. It's a promise that we've done the work of checking every claim, that we've quantified our confidence in each one, and that we're transparent about how we arrived at every number.

 

In a landscape where AI tools generate content with unstated and unmeasured confidence levels, a quantified, decomposable, transparent trust score is a fundamentally different value proposition. You're not trusting our brand reputation. You're trusting an open formula applied to evidence you can inspect yourself.

 

In Part 5, we'll cover the final module: the Researcher Escalation Queue — what happens when automated verification can't make a confident determination and human expertise is needed.

 

Quantified trust. Transparent methodology. [Start generating verified reports →](/register)