AIME 2026
Official Hugging Face benchmark for model performance on 2026 AIME math problems.
12rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | stepfun-ai/Step-3.5-Flash | 96.67 | — | Imported | 2026-05-06 |
| 2 | moonshotai/Kimi-K2.5 | 95.83 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 3 | zai-org/GLM-5 | 95.83 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 4 | deepseek-ai/DeepSeek-V3.2 | 94.17 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 5 | Qwen/Qwen3.5-397B-A17B | 93.33 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-06 |
| 6 | Qwen/Qwen3.5-35B-A3B | 93.33 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-06 |
| 7 | Qwen/Qwen3.5-9B | 92.50 | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-06 |
| 8 | Qwen/Qwen3.5-27B | 90.83 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-06 |
| 9 | nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | 90 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-06 |
| 10 | Qwen/Qwen3-30B-A3B-Thinking-2507 | 87.50 | Qwen3 30B A3B Thinking 2507 qwen-qwen3-30b-a3b-thinking-2507 | Imported | 2026-05-06 |
| 11 | Qwen/Qwen3-4B-Thinking-2507 | 82.50 | — | Imported | 2026-05-06 |
| 12 | lm-provers/QED-Nano | 82.50 | — | Imported | 2026-05-06 |
No matching rows.