Skip to main content
Updated Daily at 1:00 AM GMT

Claim Verification Benchmark Leaderboard

Which AI model is best at citing its sources? Daily benchmark measuring Citation Coverage, Grounding Rate, and Ungrounded Claims across 50 research prompts.

Free and open data to help you choose the best AI model for research you can defend.

50
Research Prompts
Daily
Test Frequency
5
Research Domains
8
Time Periods

Claim Verification Leaderboard

Rankings by Citation Coverage and Grounding Rate.

#Model
≥80% excellent ≥60% good ≥40% fair <40% poor· Tokens and Cost are per-run averages

Performance Insights

Discover trends, compare models across time periods, and find the best value. Data updates as new benchmarks complete.

Recent Daily Results

Research prompts tested across all models today — click any card for full results

How It Works

Our automated system ensures fair, consistent citation testing across all models

Domain-Calibrated Research Prompts

50 structured research prompts across economic, medical, historical, technology, and policy domains — each calibrated to elicit 3–6 verifiable factual claims per response.

Claim Extraction & NLI Verification

Each response is analyzed for factual claims. Every claim is checked for citation coverage and confirmed with NLI analysis — does the cited source actually support the claim?

Rolling Averages Updated Daily

Scores aggregate into rolling averages across 8 time periods and publish here. Citation Coverage and Grounding Rate reveal which models cite their claims and which ones actually verify.

Use the best-cited model for your research

PromptReports.ai routes your research through the model with the best citation accuracy for your domain.