Updated Daily at 4:30 AM GMT

Writing Quality Benchmark Leaderboard

Automated stop-slop detection across 50 prompts measuring AI prose quality in 5 dimensions: Directness, Rhythm, Trust, Authenticity, and Density.

Clean controls and intentional violation prompts — scored automatically.

Test Prompts

Dimensions

Pass Threshold

Prompt Categories

Writing Quality Leaderboard

Rankings by Overall Score (highest = best). Click column headers to sort.

Docs

#	Model

Score (0–100, higher = better): ≥80 excellent ≥60 good ≥40 fair <40 poor· Ties on Overall broken by cost (cheaper ranks higher)

Writing Quality Scores

Pass rate and dimension breakdown from the latest benchmark run. A prompt passes if overall score ≥ 70/100.

Performance Insights

Discover trends in writing quality scores over time. Data updates as new benchmark runs complete.

Recent Daily Results

Pass rate and score breakdown from each benchmark run

No benchmark history yet. Check back after the first run completes.

How It Works

Automated stop-slop detection ensures consistent, objective prose quality measurement

50 Structured Test Prompts

12 categories including hedging, filler phrases, passive voice, generic adjectives, em-dashes, and clean controls across analyst, narrative, and edge-case formats.

5-Dimension Scoring

Each prompt is scored across Directness, Rhythm, Trust, Authenticity, and Density using regex-based violation detection. A prompt passes with an overall score ≥ 70/100.

Automated Quality Tracking

Results track over time. Clean controls must pass. Violation prompts are expected to fail — a high fail rate on violation prompts confirms detector sensitivity.

Write reports that don't sound like AI

PromptReports.ai uses the stop-slop engine to automatically improve prose quality across every research report.

Get Started Free All Benchmarks