EvalPlus
EvalPlus leaderboard aggregating HumanEval+ and MBPP+ code-generation pass@1 scores.
25rows
evalplus_averageprimary metric
2026-05-05sampled
Metadata
Metrics
EvalPlus Avg., HumanEval+ pass@1, MBPP+ pass@1
| Rank | Subject | EvalPlus Avg. | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | O1 Preview (Sept 2024) | 84.60 | o1-preview openai-o1-preview | Imported | 2026-05-05 |
| 2 | O1 Mini (Sept 2024) | 83.90 | — | Imported | 2026-05-05 |
| 3 | Qwen2.5-Coder-32B-Instruct | 82.10 | Qwen2.5 Coder 32B Instruct qwen-qwen-2.5-coder-32b-instruct | Imported | 2026-05-05 |
| 4 | DeepSeek-V3 (Nov 2024) | 79.80 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-05 |
| 5 | GPT 4o (Aug 2024) | 79.70 | GPT-4o openai-gpt-4o | Imported | 2026-05-05 |
| 6 | DeepSeek-V2.5 (Nov 2024) | 78.80 | — | Imported | 2026-05-05 |
| 7 | DeepSeek-Coder-V2-Instruct | 78.70 | — | Imported | 2026-05-05 |
| 8 | Claude Sonnet 3.5 (June 2024) | 78 | — | Imported | 2026-05-05 |
| 9 | GPT 4o Mini (July 2024) | 77.85 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-05 |
| 10 | GPT-4-Turbo (Nov 2023) | 77.50 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-05 |
| 11 | Gemini 1.5 Pro 002 | 76.95 | — | Imported | 2026-05-05 |
| 12 | claude-3-opus (Mar 2024) | 75.35 | — | Imported | 2026-05-05 |
| 13 | OpenCoder-8B-Instruct | 74.40 | — | Imported | 2026-05-05 |
| 14 | CodeQwen1.5-7B-Chat | 73.85 | — | Imported | 2026-05-05 |
| 15 | Grok Beta | 73.05 | — | Imported | 2026-05-05 |
| 16 | DeepSeek-Coder-33B-instruct | 72.55 | — | Imported | 2026-05-05 |
| 17 | Gemini 1.5 Flash 002 | 71.55 | — | Imported | 2026-05-05 |
| 18 | OpenCodeInterpreter-DS-33B | 71.15 | — | Imported | 2026-05-05 |
| 19 | Artigenz-Coder-DS-6.7B | 71.10 | — | Imported | 2026-05-05 |
| 20 | Llama3-70B-instruct | 70.50 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-05 |
| 21 | GPT-3.5-Turbo (Nov 2023) | 70.20 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-05 |
| 22 | Magicoder-S-DS-6.7B | 70.15 | — | Imported | 2026-05-05 |
| 23 | OpenCodeInterpreter-DS-6.7B | 69.20 | — | Imported | 2026-05-05 |
| 24 | claude-3-haiku (Mar 2024) | 68.85 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-05 |
| 25 | DeepSeek-Coder-6.7B-instruct | 68.45 | — | Imported | 2026-05-05 |
No matching rows.