IMO-AnswerBench
IMO-AnswerBench is a benchmark for evaluating mathematical reasoning capabilities on International Mathematical Olympiad (IMO) problems, focusing on answer generation and verification.
20rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Normalized Score
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen3.7 Max | 90% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 2 | DeepSeek V4 Pro Max | 89.8% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 3 | Kimi K2.6 Thinking | 86% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 4 | GLM-5.1 Thinking | 83.8% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 5 | Qwen3.6 Plus | 83.8% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 6 | Claude Opus 4.6 Max | 75.3% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 1 | DeepSeek-V4-Pro-Max | 0.90 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-06 |
| 2 | DeepSeek-V4-Flash-Max | 0.88 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Self-reported | 2026-05-06 |
| 3 | Kimi K2.6 | 0.86 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-06 |
| 4 | Step-3.5-Flash | 0.85 | Step 3.5 Flash stepfun-step-3.5-flash | Self-reported | 2026-05-06 |
| 5 | GLM-5.1 | 0.84 | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-06 |
| 5 | Qwen3.6 Plus | 0.84 | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-06 |
| 7 | GLM-4.7 | 0.82 | GLM 4.7 z-ai-glm-4.7 | Self-reported | 2026-05-06 |
| 8 | Kimi K2.5 | 0.82 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Self-reported | 2026-05-06 |
| 9 | Qwen3.5-397B-A17B | 0.81 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Self-reported | 2026-05-06 |
| 10 | Qwen3.6-27B | 0.81 | Qwen3.6 27B qwen-qwen3.6-27b | Self-reported | 2026-05-06 |
| 11 | Qwen3.6-35B-A3B | 0.79 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Self-reported | 2026-05-06 |
| 12 | Kimi K2-Thinking-0905 | 0.79 | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Self-reported | 2026-05-06 |
| 12 | LongCat-Flash-Thinking-2601 | 0.79 | — | Self-reported | 2026-05-06 |
| 14 | DeepSeek-V3.2 | 0.78 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Self-reported | 2026-05-06 |
No matching rows.