BigCodeBench-Hard

BigCodeBench-Hard evaluates code generation on the harder BigCodeBench subset, reporting pass@1 in complete and instruct settings.

25rows
instruct_pass_at_1primary metric
2026-05-05sampled

Metadata

Metrics

Instruct pass@1, Complete pass@1

Latest Results

Rank Subject Instruct pass@1 Model Match Provenance Sampled
1 o3-mini-2025-01-31 (temperature=1, reasoning=medium) 33.10 o3-mini
openai-o3-mini
Imported 2026-05-05
2 o1-2024-12-17 (temperature=1, reasoning=high) 32.40 o1
openai-o1
Imported 2026-05-05
3 o3-mini-2025-01-31 (temperature=1, reasoning=high) 32.40 o3-mini
openai-o3-mini
Imported 2026-05-05
4 Claude-3.7-Sonnet-20250219 (temperature=1, length=12800, reasoning=3200) 32.40 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-05
5 Claude-3.7-Sonnet-20250219 31.80 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-05
6 Quasar-Alpha 31.80 Imported 2026-05-05
7 GPT-4.1-2025-04-14 31.80 GPT-4.1
openai-gpt-4.1
Imported 2026-05-05
8 GPT-4.1-Mini-2025-04-14 31.80 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-05
9 o3-mini-2025-01-31 (temperature=1, reasoning=low) 31.10 o3-mini
openai-o3-mini
Imported 2026-05-05
10 Grok-3-Mini-Beta (temperature=1, reasoning=low) 31.10 GROK Grok 3 Mini Beta
x-ai-grok-3-mini-beta
Imported 2026-05-05
11 Optimus-Alpha 30.40 Imported 2026-05-05
12 Athene-V2-Agent 29.70 Imported 2026-05-05
13 o1-2024-12-17 (temperature=1, reasoning=low) 29.70 o1
openai-o1
Imported 2026-05-05
14 DeepSeek-R1 29.70 R1
deepseek-r1
Imported 2026-05-05
15 QwQ-32B (w/ Reasoning) 29.70 Imported 2026-05-05
16 Gemini-2.5-Pro-Exp-03-25 29.70 Imported 2026-05-05
17 GPT-4-Turbo-2024-04-09 29.10 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-05
18 Qwen2.5-Max 29.10 Imported 2026-05-05
19 Llama-3.3-70B-Instruct 28.40 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-05
20 o1-2024-12-17 (temperature=1, reasoning=medium) 28.40 o1
openai-o1
Imported 2026-05-05
21 DeepSeek-V3 28.40 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-05
22 GPT-4.1-Nano-2025-04-14 28.40 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-05
23 o1-Mini-2024-09-12 (temperature=1) 27.70 Imported 2026-05-05
24 Qwen2.5-Coder-32B-Instruct 27.70 Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-05
25 GPT-4o-2024-11-20 27.70 GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-05