AIME 2026

Official Hugging Face benchmark for model performance on 2026 AIME math problems.

12rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score

Latest Results

Rows are ranked by the benchmark score column. Model display names are preserved from the OpenEvals source dataset.

Rank Subject Score Model Match Provenance Sampled
1 stepfun-ai/Step-3.5-Flash 96.67 Imported 2026-05-06
2 moonshotai/Kimi-K2.5 95.83 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
3 zai-org/GLM-5 95.83 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
4 deepseek-ai/DeepSeek-V3.2 94.17 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
5 Qwen/Qwen3.5-397B-A17B 93.33 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-06
6 Qwen/Qwen3.5-35B-A3B 93.33 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-06
7 Qwen/Qwen3.5-9B 92.50 Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-06
8 Qwen/Qwen3.5-27B 90.83 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-06
9 nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 90 Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-06
10 Qwen/Qwen3-30B-A3B-Thinking-2507 87.50 Qwen3 30B A3B Thinking 2507
qwen-qwen3-30b-a3b-thinking-2507
Imported 2026-05-06
11 Qwen/Qwen3-4B-Thinking-2507 82.50 Imported 2026-05-06
12 lm-provers/QED-Nano 82.50 Imported 2026-05-06