FrontierMath 2025-02-28 Private

Private FrontierMath research-level mathematics benchmark snapshot.

31rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Showing 2 latest source slices.

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.2 40.70 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
2 Gemini 3 Pro 37.60 Gemini 3
google-gemini-3
Imported 2026-05-06
3 Gemini 2.5 Pro (Jun 2025) 29 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
4 o4-mini (high) 24.83 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
5 DeepSeek V3 22.10 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
6 Claude Opus 4.5 20.69 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
7 Grok 4 19.66 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
8 o4-mini-2025-04-16 medium 18.97 o4 Mini
openai-o4-mini
Imported 2026-05-06
9 o3 18.69 o3
openai-o3
Imported 2026-05-06
10 Claude Sonnet 4.5 13.49 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
11 o1 9.31 o1
openai-o1
Imported 2026-05-06
12 Qwen 3 235B 8.48 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
13 Claude Haiku 4.5 5.90 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
14 Grok-3 mini 5.86 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-06
15 GPT-4.1 5.52 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
16 GPT-4.1 mini 4.48 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
17 Claude 3.7 Sonnet 4.14 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
18 Qwen Plus 1.72 Qwen-Plus
qwen-qwen-plus
Imported 2026-05-06
19 Qwen2.5-Max 1.03 Imported 2026-05-06
20 Llama 4 Maverick (FP8) 0.69 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
21 Mistral Large 0.35 Mistral Large
mistralai-mistral-large
Imported 2026-05-06
22 GPT-4o 0.34 GPT-4o
openai-gpt-4o
Imported 2026-05-06
23 Claude 3.5 Haiku 0.34 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
24 Llama 4 Scout 0 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
25 Gemini 1.5 Flash 0 Imported 2026-05-06
1 GPT-5.5 Pro 52.4% GPT-5.5 Pro
openai-gpt-5.5-pro
Launch post 2026-04-23
2 GPT-5.5 51.7% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
3 GPT-5.4 Pro 50% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
4 GPT-5.4 47.6% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
5 Claude Opus 4.7 43.8% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
6 Gemini 3.1 Pro Preview 36.9% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23