App-Bench
Benchmark evaluating how AI coding agents and web-app builders generate real web applications from one natural-language prompt without human edits.
10rows
percentile_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Percentile Score
| Rank | Subject | Percentile Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Orchids | 76.80 | — | Imported | 2026-05-06 |
| 2 | Claude Code | Opus 4.5 | 67.50 | — | Imported | 2026-05-06 |
| 3 | v0 | 64.90 | — | Imported | 2026-05-06 |
| 4 | Bolt | 53.60 | — | Imported | 2026-05-06 |
| 5 | Google AI Studio | Gemini 3 Pro Preview | 50.30 | — | Imported | 2026-05-06 |
| 6 | Codex | gpt-5.1-codex-max | 38.40 | — | Imported | 2026-05-06 |
| 7 | Replit | 35.10 | — | Imported | 2026-05-06 |
| 8 | Cursor | Composer 1 | 27.80 | — | Imported | 2026-05-06 |
| 9 | Lovable | 25.80 | — | Imported | 2026-05-06 |
| 10 | Gemini CLI | Gemini 2.5 Pro | 0 | — | Imported | 2026-05-06 |
No matching rows.