App-Bench

Benchmark evaluating how AI coding agents and web-app builders generate real web applications from one natural-language prompt without human edits.

10rows
percentile_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Percentile Score

Latest Results

Rows ranked by highest percentile score.

Rank Subject Percentile Score Model Match Provenance Sampled
1 Orchids 76.80 Imported 2026-05-06
2 Claude Code | Opus 4.5 67.50 Imported 2026-05-06
3 v0 64.90 Imported 2026-05-06
4 Bolt 53.60 Imported 2026-05-06
5 Google AI Studio | Gemini 3 Pro Preview 50.30 Imported 2026-05-06
6 Codex | gpt-5.1-codex-max 38.40 Imported 2026-05-06
7 Replit 35.10 Imported 2026-05-06
8 Cursor | Composer 1 27.80 Imported 2026-05-06
9 Lovable 25.80 Imported 2026-05-06
10 Gemini CLI | Gemini 2.5 Pro 0 Imported 2026-05-06