Updated Daily at 2:00 AM GMT

PromptReports Quality Index

A single, weighted composite score across all four benchmark dimensions — the only AI quality index combining citation accuracy, hallucination resistance, source authority, and prose quality.

Which AI model delivers the highest overall research quality? Find out with verified daily benchmarks.

30%

Claim Verification

30%

Hallucination Detection

25%

Source Tracing

15%

Writing Quality

Quality Index Leaderboard

Rankings by PQI Score (0–100, highest = best). Only models with data in all four pipelines appear here.

Docs

#	Model

PQI Score (0–100, higher = better): ≥80 excellent ≥60 good ≥40 fair <40 poor· CV=Claim Verification · HD=Hallucination Detection · ST=Source Tracing · WQ=Writing Quality

Performance Insights

Discover how composite scores evolve over time, compare models across all four quality dimensions, and find which AI delivers the most consistent research quality.

How PQI Is Calculated

Four daily pipelines, normalized to 0–100, combined into a single weighted score

Four Daily Benchmarks

Each pipeline runs every 24 hours: CV tests citation accuracy, HD tests hallucination resistance, ST tests source authority, WQ tests prose quality.

Normalize to 0–100

Every benchmark score is normalized to a 0–100 scale where higher is always better. Hallucination Rate is inverted: lower hallucination = higher HD score.

Weighted Composite

PQI = CV×30% + HD×30% + ST×25% + WQ×15%. Only models with data across all four pipelines qualify for a composite ranking.

Use the model with the highest overall quality score

PromptReports.ai routes your research through the top-ranked model for your use case — verified daily across all four quality dimensions by this benchmark.

Get Started Free All Benchmarks