ProjDevBench

End-to-end project development benchmark evaluating coding agents on complete executable software repository construction from high-level specifications.

10rows
final_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Final, Overall Exec., Overall CR, Easy Exec., Easy CR, Hard Exec., Hard CR

Latest Results

Rows are parsed from the public ProjDevBench project-page leaderboard. Source agent and model display names are preserved.

Rank Subject Final Model Match Provenance Sampled
1 Codex + GPT-5 77.85 Imported 2026-05-06
2 Cursor + Gemini-3-Pro-Preview 75.32 Imported 2026-05-06
3 Augment + GPT-5 72.35 Imported 2026-05-06
4 Cursor + GPT-5 71.85 Imported 2026-05-06
5 Cursor + Sonnet-4.5 70.88 Imported 2026-05-06
6 Augment + Sonnet-4.5 70.10 Imported 2026-05-06
7 Claude Code + Sonnet-4.5 68.87 Imported 2026-05-06
8 Gemini CLI + Gemini-3-Pro-Preview 68.61 Imported 2026-05-06
9 GitHub Copilot + Sonnet-4.5 67.18 Imported 2026-05-06
10 Codex + Sonnet-4.5 60.41 Imported 2026-05-06