MATH-500

MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.

32rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 LongCat-Flash-Thinking 0.99 Self-reported 2026-05-06
2 Sarvam-105B 0.99 Self-reported 2026-05-06
3 GLM-4.5 0.98 GLM GLM 4.5
z-ai-glm-4.5
Self-reported 2026-05-06
4 GLM-4.5-Air 0.98 GLM GLM 4.5 Air
z-ai-glm-4.5-air
Self-reported 2026-05-06
5 Nemotron Nano 9B v2 0.98 Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Self-reported 2026-05-06
6 Kimi K2-Instruct-0905 0.97 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Self-reported 2026-05-06
6 Kimi K2 Instruct 0.97 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Self-reported 2026-05-06
8 Sarvam-30B 0.97 Self-reported 2026-05-06
8 Llama 3.1 Nemotron Ultra 253B v1 0.97 Self-reported 2026-05-06
10 MiniMax M1 80K 0.97 Self-reported 2026-05-06
10 LongCat-Flash-Lite 0.97 Self-reported 2026-05-06
12 Llama-3.3 Nemotron Super 49B v1 0.97 Self-reported 2026-05-06
13 LongCat-Flash-Chat 0.96 Self-reported 2026-05-06
14 Claude 3.7 Sonnet 0.96 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Self-reported 2026-05-06
14 Kimi-k1.5 0.96 Self-reported 2026-05-06
16 MiniMax M1 40K 0.96 Self-reported 2026-05-06
17 DeepSeek R1 Zero 0.96 Self-reported 2026-05-06
18 Llama 3.1 Nemotron Nano 8B V1 0.95 Self-reported 2026-05-06
19 Phi 4 Mini Reasoning 0.95 Self-reported 2026-05-06
20 DeepSeek R1 Distill Llama 70B 0.94 R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Self-reported 2026-05-06
21 DeepSeek R1 Distill Qwen 32B 0.94 R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Self-reported 2026-05-06
22 DeepSeek-V3 0324 0.94 DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Self-reported 2026-05-06
23 DeepSeek R1 Distill Qwen 14B 0.94 Self-reported 2026-05-06
24 DeepSeek R1 Distill Qwen 7B 0.93 Self-reported 2026-05-06
25 QwQ-32B 0.91 Self-reported 2026-05-06
25 QwQ-32B-Preview 0.91 Self-reported 2026-05-06
27 DeepSeek-V3 0.90 DeepSeek V3
deepseek-deepseek-chat
Self-reported 2026-05-06
28 o1-mini 0.90 Self-reported 2026-05-06
29 DeepSeek R1 Distill Llama 8B 0.89 Self-reported 2026-05-06
30 DeepSeek R1 Distill Qwen 1.5B 0.84 Self-reported 2026-05-06
31 Granite 3.3 8B Base 0.69 Self-reported 2026-05-06
31 Granite 3.3 8B Instruct 0.69 Self-reported 2026-05-06