ALE-Bench

Score-based algorithmic programming benchmark built from AtCoder Heuristic Contest tasks, evaluating AI systems on hard optimization problems with hidden/private test evaluation.

90rows
performance_self_refine_1primary metric
2026-05-06sampled

Metadata

Metrics

Performance (Self-Refine x1), Rank (Self-Refine x1) (lower is better), Cost (Self-Refine x1) (lower is better), Performance (Self-Refine x16), Rank (Self-Refine x16) (lower is better), Cost (Self-Refine x16) (lower is better)

Latest Results

Rows are imported from ALE-Bench results_summary.json. Score is mean performance for self-refine x1 over all problems; additional self-refinement aggregate means are preserved as metrics.

Rank Subject Performance (Self-Refine x1) Model Match Provenance Sampled
1 gpt-5.5-xhigh 1942.97 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
2 gpt-5.3-codex-xhigh 1655.22 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-06
3 gpt-5.4-high 1607 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
4 gpt-5.5-medium 1589.38 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
5 gpt-5.4-medium 1520.72 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
6 gemini-3-flash-preview-high 1367.20 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06
7 claude-4.6-sonnet-medium 1327.30 Imported 2026-05-06
8 claude-4.7-opus-no-thinking 1323.05 Imported 2026-05-06
9 gpt-5.2-codex-xhigh 1299.90 GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-06
10 gpt-5.2-high 1293.55 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
11 gpt-5.2-medium 1249.83 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
12 gpt-5.1-codex-high 1244.92 GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-06
13 gpt-5.1-codex-max-xhigh 1228.25 GPT-5.1-Codex-Max
openai-gpt-5.1-codex-max
Imported 2026-05-06
14 gpt-5.1-codex-max-high 1208.83 GPT-5.1-Codex-Max
openai-gpt-5.1-codex-max
Imported 2026-05-06
15 gpt-5.1-thinking 1192.15 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
16 gpt-5.4-mini-high 1188.58 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-06
17 gemini-3-pro-preview-high 1176.75 Gemini 3
google-gemini-3
Imported 2026-05-06
18 gpt-5-thinking 1162.45 GPT-5
openai-gpt-5
Imported 2026-05-06
19 gemini-3.1-pro-preview-high 1160.60 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
20 grok-4.20-beta 1150.28 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-06
21 gpt-5.5-none 1127.58 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
22 kimi-k2.6 1092.67 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-06
23 gpt-5.4-none 1086.03 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
24 gemini-3.1-pro-preview-low 1054.78 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
25 claude-4.5-opus 1025.38 Imported 2026-05-06
26 deepseek-v4-pro-high 1006.08 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-06
27 gpt-5.4-nano-high 1004.52 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-06
28 claude-4.6-opus-no-thinking 996.50 Imported 2026-05-06
29 gemini-3-pro-preview-low 988.23 Gemini 3
google-gemini-3
Imported 2026-05-06
30 grok-4.3 944.17 GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-06
31 o3-high 933.55 Imported 2026-05-06
32 gemma-4-26b-a4b-it 927.17 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-06
33 gemma-4-31b-it 925.50 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-06
34 mimo-v2.5-pro 899.80 MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-06
35 glm-5.1 887.10 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-06
36 o4-mini-high 826.17 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
37 kimi-k2.5 821.65 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
38 gpt-5 807.65 GPT-5
openai-gpt-5
Imported 2026-05-06
39 deepseek-r1-0528 804.13 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-06
40 gpt-5-mini-thinking 799.77 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
41 gemini-3.1-flash-lite-preview-high 797.73 Imported 2026-05-06
42 claude-4.5-sonnet 796.15 Imported 2026-05-06
43 mercury-2 785.58 I Mercury 2
inception-mercury-2
Imported 2026-05-06
44 gemini-2.5-pro-thinking 785.52 Imported 2026-05-06
45 mimo-v2-pro 785.17 MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-06
46 glm-5 765.63 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
47 deepseek-v3.1-terminus 745.17 DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-06
48 mimo-v2-flash 737.95 MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-06
49 gpt-5-nano-thinking 718.67 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
50 deepseek-v4-flash-high 678.20 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-06
51 claude-4.1-opus 674.77 Imported 2026-05-06
52 qwen3.6-plus 670.15 Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-06
53 gemini-2.5-flash-thinking 661.88 Imported 2026-05-06
54 claude-4-sonnet 655.35 Imported 2026-05-06
55 claude-4.5-haiku 653.48 Imported 2026-05-06
56 glm-5-turbo 633.98 GLM GLM 5 Turbo
z-ai-glm-5-turbo
Imported 2026-05-06
57 minimax-m2.1 623.83 MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-06
58 qwen3.5-397b-a17b 621.92 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-06
59 minimax-m2.5 618.17 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-06
60 qwen3.5-plus 608.92 Qwen3.5 Plus 2026-04-20
qwen-qwen3.5-plus-20260420
Imported 2026-05-06
61 minimax-m2.7 599.25 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-06
62 kimi-k2-thinking 597.50 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
63 grok-code-fast-1 587.73 GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-06
64 gpt-oss-120b 575.63 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
65 gpt-oss-20b 566.05 gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-06
66 gpt-4.1 558.10 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
67 deepseek-v4-pro-no-thinking 521.67 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-06
68 mimo-v2.5 513.95 MiMo-V2.5
xiaomi-mimo-v2.5
Imported 2026-05-06
69 mistral-small-4 497.63 Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-06
70 qwen3-coder 461.45 Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-06
71 qwen3-coder-plus 456.50 Qwen3 Coder Plus
qwen-qwen3-coder-plus
Imported 2026-05-06
72 glm-4.7 399.48 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
73 grok-4.1-fast 394.93 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-06
74 qwen3-max 370.45 Qwen3 Max
qwen-qwen3-max
Imported 2026-05-06
75 qwen3.5-27b 349.45 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-06
76 glm-4.5 344.82 GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-06
77 glm-4.6 340.82 GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-06
78 qwen3.6-flash 326.40 Qwen3.6 Flash
qwen-qwen3.6-flash
Imported 2026-05-06
79 gemini-2.5-flash-lite-thinking 325.90 Imported 2026-05-06
80 deepseek-v4-flash-no-thinking 324.98 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-06
81 kimi-k2-0905 267.13 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-06
82 qwen3.5-flash 265.93 Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-06
83 mistral-large-3-2512 264.70 Imported 2026-05-06
84 nova-2-lite-v1 236.25 Nova 2 Lite
amazon-nova-2-lite-v1
Imported 2026-05-06
85 qwen3.5-35b-a3b 221.80 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-06
86 nemotron-3-super 213.90 Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-06
87 mistral-medium-3.1 210.18 Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-06
88 llama-4-maverick 172.97 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
89 nova-premier-v1 147.38 Nova Premier 1.0
amazon-nova-premier-v1
Imported 2026-05-06
90 codestral-2508 137.78 Mistral: Codestral 2508
mistralai-codestral-2508
Imported 2026-05-06