MATH 500

Academic math benchmark on probability, algebra, and trigonometry

60rows
scoreprimary metric
2026-01-09sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Gemini 3 Pro Preview 96.4% Gemini 3
google-gemini-3
Imported 2026-01-09
2 Grok 4.0709 96.2% GROK Grok 4
x-ai-grok-4
Imported 2026-01-09
3 GPT 5.2025-08-07 96% GPT-5
openai-gpt-5
Imported 2026-01-09
4 Claude Opus 4.1 20250805 Thinking 95.4% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-01-09
5 Gemini 2.5 Pro Exp 03 25 95.2% Imported 2026-01-09
6 GPT Oss 120B 94.8% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-01-09
7 GPT 5 Mini 2025-08-07 94.8% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-01-09
8 Qwen 3 235B A22b 94.6% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-01-09
9 O3 2025-04-16 94.6% o3
openai-o3
Imported 2026-01-09
10 GPT Oss 20B 94.2% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-01-09
11 Grok 3 Mini Fast High Reasoning 94.2% Imported 2026-01-09
12 O4 Mini 2025-04-16 94.2% o4 Mini
openai-o4-mini
Imported 2026-01-09
13 Kimi K2 Instruct 94.2% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-01-09
14 GLM 4.5 94% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-01-09
15 Claude Sonnet 4.20250514 Thinking 93.8% Imported 2026-01-09
16 GPT 5 Nano 2025-08-07 93.8% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-01-09
17 Claude Opus 4.1 20250805 93% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-01-09
18 DeepSeek R1 92.2% R1
deepseek-r1
Imported 2026-01-09
19 Gemini 2.5 Flash Preview 04 17 Thinking 91.8% Imported 2026-01-09
20 O3 Mini 2025-01-31 91.8% o3-mini
openai-o3-mini
Imported 2026-01-09
21 Claude 3 7 Sonnet 20250219 Thinking 91.6% Imported 2026-01-09
22 Gemini 2.5 Flash Preview 04 17 91.6% Imported 2026-01-09
23 Llama 3.3 Nemotron Super 49B V1 42e84561 Thinking 91.4% Imported 2026-01-09
24 Claude Opus 4.20250514 90.4% Claude Opus 4
anthropic-claude-opus-4
Imported 2026-01-09
25 O1 2024-12-17 90.4% o1
openai-o1
Imported 2026-01-09
26 Claude Sonnet 4.20250514 90.323% Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-01-09
27 Grok 3 89.8% GROK Grok 3
xaigrok-3
Imported 2026-01-09
28 Gemini 2.0 Flash Exp 89% Imported 2026-01-09
29 MiniMax M2.1 89% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-01-09
30 DeepSeek V3 0324 88.6% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-01-09
31 Gemini 2.0 Flash 001 88% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-01-09
32 GPT 4.1 Mini 2025-04-14 88% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-01-09
33 GPT 4.1 2025-04-14 87.2% GPT-4.1
openai-gpt-4.1
Imported 2026-01-09
34 Mistral Medium 2505 87% Imported 2026-01-09
35 Llama4 Maverick Instruct Basic 85.2% Imported 2026-01-09
36 Gemini 2.0 Flash Thinking Exp 01 21 84.6% Imported 2026-01-09
37 Gemini 1.5 Pro 002 82.8% Imported 2026-01-09
38 DeepSeek V3 80.4% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-01-09
39 GPT 4.1 Nano 2025-04-14 80.2% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-01-09
40 Llama 4 Scout 17B 16E Instruct 79.2% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-01-09
41 Gemini 1.5 Flash 002 78.8% Imported 2026-01-09
42 Grok 2.1212 78.4% Imported 2026-01-09
43 Claude 3 7 Sonnet 20250219 76.8% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-01-09
44 Command A 03 2025 76.2% C Command A
cohere-command-a
Imported 2026-01-09
45 GPT 4O 2024-08-06 75.2% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-01-09
46 Mistral Large 2411 74.4% Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-01-09
47 GPT 4O 2024-11-20 74% GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-01-09
48 Llama 3.3 70B Instruct Turbo 73.4% Imported 2026-01-09
49 GPT 4O Mini 2024-07-18 72.6% GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-01-09
50 Claude 3 5 Sonnet 20241022 72.4% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-01-09
51 Meta Llama 3.1 405B Instruct Turbo 71.4% Imported 2026-01-09
52 Llama 3.3 Nemotron Super 49B V1 42e84561 71.2% Imported 2026-01-09
53 Mistral Small 2402 70.6% Imported 2026-01-09
54 Grok 3 Mini Fast Low Reasoning 70.2% Imported 2026-01-09
55 Mistral Small 2503 68.4% Imported 2026-01-09
56 Meta Llama 3.1 70B Instruct Turbo 65% Imported 2026-01-09
57 Claude 3 5 Haiku 20241022 64.2% Imported 2026-01-09
58 Jamba Large 1.6 54.8% Imported 2026-01-09
59 Meta Llama 3.1 8B Instruct Turbo 44.4% Imported 2026-01-09
60 Jamba Mini 1.6 25.4% Imported 2026-01-09