HumanEval+

HumanEval+ code-generation leaderboard from EvalPlus.

25rows
humaneval_plusprimary metric
2026-05-05sampled

Metadata

Metrics

HumanEval+ pass@1, HumanEval pass@1

Latest Results

Rank Subject HumanEval+ pass@1 Model Match Provenance Sampled
1 O1 Preview (Sept 2024) 89 o1-preview
openai-o1-preview
Imported 2026-05-05
2 O1 Mini (Sept 2024) 89 Imported 2026-05-05
3 Qwen2.5-Coder-32B-Instruct 87.20 Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-05
4 GPT 4o (Aug 2024) 87.20 GPT-4o
openai-gpt-4o
Imported 2026-05-05
5 DeepSeek-V3 (Nov 2024) 86.60 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-05
6 GPT-4-Turbo (April 2024) 86.60 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-05
7 DeepSeek-V2.5 (Nov 2024) 83.50 Imported 2026-05-05
8 GPT 4o Mini (July 2024) 83.50 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-05
9 DeepSeek-Coder-V2-Instruct 82.30 Imported 2026-05-05
10 Claude Sonnet 3.5 (June 2024) 81.70 Imported 2026-05-05
11 GPT-4-Turbo (Nov 2023) 81.70 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-05
12 Grok Beta 80.50 Imported 2026-05-05
13 Gemini 1.5 Pro 002 79.30 Imported 2026-05-05
14 GPT-4 (May 2023) 79.30 GPT-4
openai-gpt-4
Imported 2026-05-05
15 CodeQwen1.5-7B-Chat 78.70 Imported 2026-05-05
16 OpenCoder-8B-Instruct 77.40 Imported 2026-05-05
17 claude-3-opus (Mar 2024) 77.40 Imported 2026-05-05
18 Gemini 1.5 Flash 002 75.60 Imported 2026-05-05
19 DeepSeek-Coder-33B-instruct 75 Imported 2026-05-05
20 Codestral-22B-v0.1 73.80 Imported 2026-05-05
21 OpenCodeInterpreter-DS-33B 73.80 Imported 2026-05-05
22 WizardCoder-33B-V1.1 73.20 Imported 2026-05-05
23 Artigenz-Coder-DS-6.7B 72.60 Imported 2026-05-05
24 Llama3-70B-instruct 72 Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-05
25 Mixtral-8x22B-Instruct-v0.1 72 Imported 2026-05-05