CCBench

Coding-agent benchmark evaluating real-world tasks in small private CodeCrafters codebases under 10k lines of code, with official test runners checking task success.

8rows
success_rateprimary metric
2026-05-06sampled

Metadata

Metrics

Success Rate

Latest Results

Rows are parsed from the public CCBench results table. Source agent and model display names are preserved.

Rank Subject Success Rate Model Match Provenance Sampled
1 Codex CLI w/ GPT 5.2-codex 75.40 Imported 2026-05-06
2 Claude Code w/ Opus 4.6 72.70 Imported 2026-05-06
3 Claude Code w/ Opus 4.5 58.30 Imported 2026-05-06
4 Gemini CLI w/ Gemini 3 Flash Preview 51.30 Imported 2026-05-06
5 Gemini CLI w/ Gemini 3 Pro Preview 47.60 Imported 2026-05-06
6 Codex CLI w/ GPT 5.1-codex-mini 42.20 Imported 2026-05-06
7 Claude Code w/ Sonnet 4.5 34.20 Imported 2026-05-06
8 Claude Code w/ Haiku 4.5 21.90 Imported 2026-05-06