CRUXEval
Code reasoning benchmark for input and output prediction, reporting pass@1 and pass@5 across code language models.
25rows
average_pass_at_1primary metric
2026-05-05sampled
Metadata
Metrics
Avg. pass@1, Input pass@1, Output pass@1, Input pass@5, Output pass@5
| Rank | Subject | Avg. pass@1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4-turbo-2024-04-09+cot (n=3) | 78.85 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-05 |
| 2 | claude-3-opus+cot (n=1) | 77.70 | — | Imported | 2026-05-05 |
| 3 | gpt-4-0613+cot | 76.30 | GPT-4 openai-gpt-4 | Imported | 2026-05-05 |
| 4 | gpt-4o+cot (n=3) | 75.80 | GPT-4o openai-gpt-4o | Imported | 2026-05-05 |
| 5 | gpt-4-0613 | 69.25 | GPT-4 openai-gpt-4 | Imported | 2026-05-05 |
| 6 | gpt-4-turbo-2024-04-09 (n=3) | 68.10 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-05 |
| 7 | gpt-4o (n=3) | 67.55 | GPT-4o openai-gpt-4o | Imported | 2026-05-05 |
| 8 | claude-3-opus (n=1) | 65 | — | Imported | 2026-05-05 |
| 9 | semcoder-s-6.7b+cot (under verification) | 63.60 | — | Imported | 2026-05-05 |
| 10 | semcoder-6.7b+cot (under verification) | 63.45 | — | Imported | 2026-05-05 |
| 11 | gpt-3.5-turbo-0613+cot | 54.65 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-05 |
| 12 | gpt-3.5-turbo-0613 | 49.20 | GPT-3.5 Turbo (older v0613) openai-gpt-3.5-turbo-0613 | Imported | 2026-05-05 |
| 13 | deepseek-instruct-33b | 48.20 | — | Imported | 2026-05-05 |
| 14 | starcoder2-15b | 47.60 | — | Imported | 2026-05-05 |
| 15 | deepseek-base-33b | 47.55 | — | Imported | 2026-05-05 |
| 16 | codetulu-2-34b | 47.55 | — | Imported | 2026-05-05 |
| 17 | codellama-34b+cot | 46.85 | — | Imported | 2026-05-05 |
| 18 | codellama-34b | 44.80 | — | Imported | 2026-05-05 |
| 19 | phind | 43.45 | — | Imported | 2026-05-05 |
| 20 | magicoder-ds-6.7b | 43.05 | — | Imported | 2026-05-05 |
| 21 | wizard-34b | 43.05 | — | Imported | 2026-05-05 |
| 22 | deepseek-base-6.7b | 42.70 | — | Imported | 2026-05-05 |
| 23 | codellama-python-34b | 42.65 | — | Imported | 2026-05-05 |
| 24 | codellama-13b+cot | 41.70 | — | Imported | 2026-05-05 |
| 25 | codellama-13b | 41.10 | — | Imported | 2026-05-05 |
No matching rows.