Claim Verification Benchmark Leaderboard
Which AI model is best at citing its sources? Daily benchmark measuring Citation Coverage, Grounding Rate, and Ungrounded Claims across 50 research prompts.
Free and open data to help you choose the best AI model for research you can defend.
Claim Verification Leaderboard
Rankings by Citation Coverage and Grounding Rate.
| # | Model | |||||
|---|---|---|---|---|---|---|
Performance Insights
Discover trends, compare models across time periods, and find the best value. Data updates as new benchmarks complete.
Recent Daily Results
Research prompts tested across all models today — click any card for full results
How It Works
Our automated system ensures fair, consistent citation testing across all models
Domain-Calibrated Research Prompts
50 structured research prompts across economic, medical, historical, technology, and policy domains — each calibrated to elicit 3–6 verifiable factual claims per response.
Claim Extraction & NLI Verification
Each response is analyzed for factual claims. Every claim is checked for citation coverage and confirmed with NLI analysis — does the cited source actually support the claim?
Rolling Averages Updated Daily
Scores aggregate into rolling averages across 8 time periods and publish here. Citation Coverage and Grounding Rate reveal which models cite their claims and which ones actually verify.
Use the best-cited model for your research
PromptReports.ai routes your research through the model with the best citation accuracy for your domain.