FrontierMath Tier 4 2025-07-01 Private

Private Tier 4 FrontierMath problems at research-level mathematical difficulty.

20rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Showing 2 latest source slices.

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.2 29.20 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
2 Gemini 3 Pro 18.75 Gemini 3
google-gemini-3
Imported 2026-05-06
3 Gemini 2.5 Pro (Jun 2025) 10.40 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
4 o4-mini (high) 6.25 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
5 o3 4.17 o3
openai-o3
Imported 2026-05-06
6 Claude Opus 4.5 4.17 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
7 DeepSeek V3 2.10 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
8 Grok 4 2.08 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
9 Claude Haiku 4.5 2.08 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
10 o4-mini-2025-04-16 medium 2.08 o4 Mini
openai-o4-mini
Imported 2026-05-06
11 GPT-4.1 0 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
12 Claude 3.7 Sonnet 0 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
13 Qwen 3 235B 0 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
14 Grok-3 mini 0 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-06
1 GPT-5.5 Pro 39.6% GPT-5.5 Pro
openai-gpt-5.5-pro
Launch post 2026-04-23
2 GPT-5.4 Pro 38% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
3 GPT-5.5 35.4% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
4 GPT-5.4 27.1% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
5 Claude Opus 4.7 22.9% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
6 Gemini 3.1 Pro Preview 16.7% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23