ALE-Bench
Score-based algorithmic programming benchmark built from AtCoder Heuristic Contest tasks, evaluating AI systems on hard optimization problems with hidden/private test evaluation.
90rows
performance_self_refine_1primary metric
2026-05-06sampled
Metadata
Metrics
Performance (Self-Refine x1), Rank (Self-Refine x1) (lower is better), Cost (Self-Refine x1) (lower is better), Performance (Self-Refine x16), Rank (Self-Refine x16) (lower is better), Cost (Self-Refine x16) (lower is better)
| Rank | Subject | Performance (Self-Refine x1) | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-5.5-xhigh | 1942.97 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 2 | gpt-5.3-codex-xhigh | 1655.22 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 3 | gpt-5.4-high | 1607 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 4 | gpt-5.5-medium | 1589.38 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 5 | gpt-5.4-medium | 1520.72 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 6 | gemini-3-flash-preview-high | 1367.20 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 7 | claude-4.6-sonnet-medium | 1327.30 | — | Imported | 2026-05-06 |
| 8 | claude-4.7-opus-no-thinking | 1323.05 | — | Imported | 2026-05-06 |
| 9 | gpt-5.2-codex-xhigh | 1299.90 | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-06 |
| 10 | gpt-5.2-high | 1293.55 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 11 | gpt-5.2-medium | 1249.83 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 12 | gpt-5.1-codex-high | 1244.92 | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-06 |
| 13 | gpt-5.1-codex-max-xhigh | 1228.25 | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-06 |
| 14 | gpt-5.1-codex-max-high | 1208.83 | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-06 |
| 15 | gpt-5.1-thinking | 1192.15 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 16 | gpt-5.4-mini-high | 1188.58 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 17 | gemini-3-pro-preview-high | 1176.75 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 18 | gpt-5-thinking | 1162.45 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 19 | gemini-3.1-pro-preview-high | 1160.60 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 20 | grok-4.20-beta | 1150.28 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-06 |
| 21 | gpt-5.5-none | 1127.58 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 22 | kimi-k2.6 | 1092.67 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 23 | gpt-5.4-none | 1086.03 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 24 | gemini-3.1-pro-preview-low | 1054.78 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 25 | claude-4.5-opus | 1025.38 | — | Imported | 2026-05-06 |
| 26 | deepseek-v4-pro-high | 1006.08 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 27 | gpt-5.4-nano-high | 1004.52 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 28 | claude-4.6-opus-no-thinking | 996.50 | — | Imported | 2026-05-06 |
| 29 | gemini-3-pro-preview-low | 988.23 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 30 | grok-4.3 | 944.17 | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-06 |
| 31 | o3-high | 933.55 | — | Imported | 2026-05-06 |
| 32 | gemma-4-26b-a4b-it | 927.17 | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-06 |
| 33 | gemma-4-31b-it | 925.50 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-06 |
| 34 | mimo-v2.5-pro | 899.80 | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-06 |
| 35 | glm-5.1 | 887.10 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 36 | o4-mini-high | 826.17 | o4 Mini High openai-o4-mini-high | Imported | 2026-05-06 |
| 37 | kimi-k2.5 | 821.65 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 38 | gpt-5 | 807.65 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 39 | deepseek-r1-0528 | 804.13 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-06 |
| 40 | gpt-5-mini-thinking | 799.77 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 41 | gemini-3.1-flash-lite-preview-high | 797.73 | — | Imported | 2026-05-06 |
| 42 | claude-4.5-sonnet | 796.15 | — | Imported | 2026-05-06 |
| 43 | mercury-2 | 785.58 | Mercury 2 inception-mercury-2 | Imported | 2026-05-06 |
| 44 | gemini-2.5-pro-thinking | 785.52 | — | Imported | 2026-05-06 |
| 45 | mimo-v2-pro | 785.17 | MiMo-V2-Pro xiaomi-mimo-v2-pro | Imported | 2026-05-06 |
| 46 | glm-5 | 765.63 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 47 | deepseek-v3.1-terminus | 745.17 | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-06 |
| 48 | mimo-v2-flash | 737.95 | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-06 |
| 49 | gpt-5-nano-thinking | 718.67 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-06 |
| 50 | deepseek-v4-flash-high | 678.20 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 51 | claude-4.1-opus | 674.77 | — | Imported | 2026-05-06 |
| 52 | qwen3.6-plus | 670.15 | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-06 |
| 53 | gemini-2.5-flash-thinking | 661.88 | — | Imported | 2026-05-06 |
| 54 | claude-4-sonnet | 655.35 | — | Imported | 2026-05-06 |
| 55 | claude-4.5-haiku | 653.48 | — | Imported | 2026-05-06 |
| 56 | glm-5-turbo | 633.98 | GLM 5 Turbo z-ai-glm-5-turbo | Imported | 2026-05-06 |
| 57 | minimax-m2.1 | 623.83 | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-06 |
| 58 | qwen3.5-397b-a17b | 621.92 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-06 |
| 59 | minimax-m2.5 | 618.17 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-06 |
| 60 | qwen3.5-plus | 608.92 | Qwen3.5 Plus 2026-04-20 qwen-qwen3.5-plus-20260420 | Imported | 2026-05-06 |
| 61 | minimax-m2.7 | 599.25 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 62 | kimi-k2-thinking | 597.50 | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Imported | 2026-05-06 |
| 63 | grok-code-fast-1 | 587.73 | Grok Code Fast 1 x-ai-grok-code-fast-1 | Imported | 2026-05-06 |
| 64 | gpt-oss-120b | 575.63 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 65 | gpt-oss-20b | 566.05 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
| 66 | gpt-4.1 | 558.10 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 67 | deepseek-v4-pro-no-thinking | 521.67 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 68 | mimo-v2.5 | 513.95 | MiMo-V2.5 xiaomi-mimo-v2.5 | Imported | 2026-05-06 |
| 69 | mistral-small-4 | 497.63 | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-06 |
| 70 | qwen3-coder | 461.45 | Qwen3 Coder 480B A35B qwen-qwen3-coder | Imported | 2026-05-06 |
| 71 | qwen3-coder-plus | 456.50 | Qwen3 Coder Plus qwen-qwen3-coder-plus | Imported | 2026-05-06 |
| 72 | glm-4.7 | 399.48 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 73 | grok-4.1-fast | 394.93 | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-06 |
| 74 | qwen3-max | 370.45 | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-06 |
| 75 | qwen3.5-27b | 349.45 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-06 |
| 76 | glm-4.5 | 344.82 | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-06 |
| 77 | glm-4.6 | 340.82 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-06 |
| 78 | qwen3.6-flash | 326.40 | Qwen3.6 Flash qwen-qwen3.6-flash | Imported | 2026-05-06 |
| 79 | gemini-2.5-flash-lite-thinking | 325.90 | — | Imported | 2026-05-06 |
| 80 | deepseek-v4-flash-no-thinking | 324.98 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 81 | kimi-k2-0905 | 267.13 | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-06 |
| 82 | qwen3.5-flash | 265.93 | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-06 |
| 83 | mistral-large-3-2512 | 264.70 | — | Imported | 2026-05-06 |
| 84 | nova-2-lite-v1 | 236.25 | Nova 2 Lite amazon-nova-2-lite-v1 | Imported | 2026-05-06 |
| 85 | qwen3.5-35b-a3b | 221.80 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-06 |
| 86 | nemotron-3-super | 213.90 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-06 |
| 87 | mistral-medium-3.1 | 210.18 | Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1 | Imported | 2026-05-06 |
| 88 | llama-4-maverick | 172.97 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 89 | nova-premier-v1 | 147.38 | Nova Premier 1.0 amazon-nova-premier-v1 | Imported | 2026-05-06 |
| 90 | codestral-2508 | 137.78 | Mistral: Codestral 2508 mistralai-codestral-2508 | Imported | 2026-05-06 |
No matching rows.