MBPP+

MBPP+ code-generation leaderboard from EvalPlus.

25rows
mbpp_plusprimary metric
2026-05-05sampled

Metadata

Metrics

MBPP+ pass@1, MBPP pass@1

Latest Results

Rank Subject MBPP+ pass@1 Model Match Provenance Sampled
1 O1 Preview (Sept 2024) 80.20 o1-preview
openai-o1-preview
Imported 2026-05-05
2 O1 Mini (Sept 2024) 78.80 Imported 2026-05-05
3 Qwen2.5-Coder-32B-Instruct 77 Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-05
4 DeepSeek-Coder-V2-Instruct 75.10 Imported 2026-05-05
5 Gemini 1.5 Pro 002 74.60 Imported 2026-05-05
6 Claude Sonnet 3.5 (June 2024) 74.30 Imported 2026-05-05
7 DeepSeek-V2.5 (Nov 2024) 74.10 Imported 2026-05-05
8 GPT-4-Turbo (Nov 2023) 73.30 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-05
9 claude-3-opus (Mar 2024) 73.30 Imported 2026-05-05
10 DeepSeek-V3 (Nov 2024) 73 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-05
11 GPT 4o (Aug 2024) 72.20 GPT-4o
openai-gpt-4o
Imported 2026-05-05
12 GPT 4o Mini (July 2024) 72.20 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-05
13 OpenCoder-8B-Instruct 71.40 Imported 2026-05-05
14 DeepSeek-Coder-33B-instruct 70.10 Imported 2026-05-05
15 GPT-3.5-Turbo (Nov 2023) 69.70 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-05
16 Artigenz-Coder-DS-6.7B 69.60 Imported 2026-05-05
17 claude-3-sonnet (Mar 2024) 69.30 Imported 2026-05-05
18 CodeQwen1.5-7B-Chat 69 Imported 2026-05-05
19 Llama3-70B-instruct 69 Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-05
20 Magicoder-S-DS-6.7B 69 Imported 2026-05-05
21 claude-3-haiku (Mar 2024) 68.80 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-05
22 OpenCodeInterpreter-DS-33B 68.50 Imported 2026-05-05
23 Gemini 1.5 Flash 002 67.50 Imported 2026-05-05
24 WhiteRabbitNeo-33B-v1 66.90 Imported 2026-05-05
25 OpenCodeInterpreter-DS-6.7B 66.40 Imported 2026-05-05