CCBench
Coding-agent benchmark evaluating real-world tasks in small private CodeCrafters codebases under 10k lines of code, with official test runners checking task success.
8rows
success_rateprimary metric
2026-05-06sampled
Metadata
Metrics
Success Rate
| Rank | Subject | Success Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Codex CLI w/ GPT 5.2-codex | 75.40 | — | Imported | 2026-05-06 |
| 2 | Claude Code w/ Opus 4.6 | 72.70 | — | Imported | 2026-05-06 |
| 3 | Claude Code w/ Opus 4.5 | 58.30 | — | Imported | 2026-05-06 |
| 4 | Gemini CLI w/ Gemini 3 Flash Preview | 51.30 | — | Imported | 2026-05-06 |
| 5 | Gemini CLI w/ Gemini 3 Pro Preview | 47.60 | — | Imported | 2026-05-06 |
| 6 | Codex CLI w/ GPT 5.1-codex-mini | 42.20 | — | Imported | 2026-05-06 |
| 7 | Claude Code w/ Sonnet 4.5 | 34.20 | — | Imported | 2026-05-06 |
| 8 | Claude Code w/ Haiku 4.5 | 21.90 | — | Imported | 2026-05-06 |
No matching rows.