ProjDevBench
End-to-end project development benchmark evaluating coding agents on complete executable software repository construction from high-level specifications.
10rows
final_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Final, Overall Exec., Overall CR, Easy Exec., Easy CR, Hard Exec., Hard CR
| Rank | Subject | Final | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Codex + GPT-5 | 77.85 | — | Imported | 2026-05-06 |
| 2 | Cursor + Gemini-3-Pro-Preview | 75.32 | — | Imported | 2026-05-06 |
| 3 | Augment + GPT-5 | 72.35 | — | Imported | 2026-05-06 |
| 4 | Cursor + GPT-5 | 71.85 | — | Imported | 2026-05-06 |
| 5 | Cursor + Sonnet-4.5 | 70.88 | — | Imported | 2026-05-06 |
| 6 | Augment + Sonnet-4.5 | 70.10 | — | Imported | 2026-05-06 |
| 7 | Claude Code + Sonnet-4.5 | 68.87 | — | Imported | 2026-05-06 |
| 8 | Gemini CLI + Gemini-3-Pro-Preview | 68.61 | — | Imported | 2026-05-06 |
| 9 | GitHub Copilot + Sonnet-4.5 | 67.18 | — | Imported | 2026-05-06 |
| 10 | Codex + Sonnet-4.5 | 60.41 | — | Imported | 2026-05-06 |
No matching rows.