MATH-500
MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.
32rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | LongCat-Flash-Thinking | 0.99 | — | Self-reported | 2026-05-06 |
| 2 | Sarvam-105B | 0.99 | — | Self-reported | 2026-05-06 |
| 3 | GLM-4.5 | 0.98 | GLM 4.5 z-ai-glm-4.5 | Self-reported | 2026-05-06 |
| 4 | GLM-4.5-Air | 0.98 | GLM 4.5 Air z-ai-glm-4.5-air | Self-reported | 2026-05-06 |
| 5 | Nemotron Nano 9B v2 | 0.98 | Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | Self-reported | 2026-05-06 |
| 6 | Kimi K2-Instruct-0905 | 0.97 | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Self-reported | 2026-05-06 |
| 6 | Kimi K2 Instruct | 0.97 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Self-reported | 2026-05-06 |
| 8 | Sarvam-30B | 0.97 | — | Self-reported | 2026-05-06 |
| 8 | Llama 3.1 Nemotron Ultra 253B v1 | 0.97 | — | Self-reported | 2026-05-06 |
| 10 | MiniMax M1 80K | 0.97 | — | Self-reported | 2026-05-06 |
| 10 | LongCat-Flash-Lite | 0.97 | — | Self-reported | 2026-05-06 |
| 12 | Llama-3.3 Nemotron Super 49B v1 | 0.97 | — | Self-reported | 2026-05-06 |
| 13 | LongCat-Flash-Chat | 0.96 | — | Self-reported | 2026-05-06 |
| 14 | Claude 3.7 Sonnet | 0.96 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Self-reported | 2026-05-06 |
| 14 | Kimi-k1.5 | 0.96 | — | Self-reported | 2026-05-06 |
| 16 | MiniMax M1 40K | 0.96 | — | Self-reported | 2026-05-06 |
| 17 | DeepSeek R1 Zero | 0.96 | — | Self-reported | 2026-05-06 |
| 18 | Llama 3.1 Nemotron Nano 8B V1 | 0.95 | — | Self-reported | 2026-05-06 |
| 19 | Phi 4 Mini Reasoning | 0.95 | — | Self-reported | 2026-05-06 |
| 20 | DeepSeek R1 Distill Llama 70B | 0.94 | R1 Distill Llama 70B deepseek-deepseek-r1-distill-llama-70b | Self-reported | 2026-05-06 |
| 21 | DeepSeek R1 Distill Qwen 32B | 0.94 | R1 Distill Qwen 32B deepseek-deepseek-r1-distill-qwen-32b | Self-reported | 2026-05-06 |
| 22 | DeepSeek-V3 0324 | 0.94 | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Self-reported | 2026-05-06 |
| 23 | DeepSeek R1 Distill Qwen 14B | 0.94 | — | Self-reported | 2026-05-06 |
| 24 | DeepSeek R1 Distill Qwen 7B | 0.93 | — | Self-reported | 2026-05-06 |
| 25 | QwQ-32B | 0.91 | — | Self-reported | 2026-05-06 |
| 25 | QwQ-32B-Preview | 0.91 | — | Self-reported | 2026-05-06 |
| 27 | DeepSeek-V3 | 0.90 | DeepSeek V3 deepseek-deepseek-chat | Self-reported | 2026-05-06 |
| 28 | o1-mini | 0.90 | — | Self-reported | 2026-05-06 |
| 29 | DeepSeek R1 Distill Llama 8B | 0.89 | — | Self-reported | 2026-05-06 |
| 30 | DeepSeek R1 Distill Qwen 1.5B | 0.84 | — | Self-reported | 2026-05-06 |
| 31 | Granite 3.3 8B Base | 0.69 | — | Self-reported | 2026-05-06 |
| 31 | Granite 3.3 8B Instruct | 0.69 | — | Self-reported | 2026-05-06 |
No matching rows.