FrontierMath Tier 4 2025-07-01 Private
Private Tier 4 FrontierMath problems at research-level mathematical difficulty.
20rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Standard error (lower is better)
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.2 | 29.20 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 2 | Gemini 3 Pro | 18.75 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 3 | Gemini 2.5 Pro (Jun 2025) | 10.40 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 4 | o4-mini (high) | 6.25 | o4 Mini High openai-o4-mini-high | Imported | 2026-05-06 |
| 5 | o3 | 4.17 | o3 openai-o3 | Imported | 2026-05-06 |
| 6 | Claude Opus 4.5 | 4.17 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 7 | DeepSeek V3 | 2.10 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 8 | Grok 4 | 2.08 | Grok 4 x-ai-grok-4 | Imported | 2026-05-06 |
| 9 | Claude Haiku 4.5 | 2.08 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 10 | o4-mini-2025-04-16 medium | 2.08 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 11 | GPT-4.1 | 0 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 12 | Claude 3.7 Sonnet | 0 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 13 | Qwen 3 235B | 0 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 14 | Grok-3 mini | 0 | Grok 3 Mini x-ai-grok-3-mini | Imported | 2026-05-06 |
| 1 | GPT-5.5 Pro | 39.6% | GPT-5.5 Pro openai-gpt-5.5-pro | Launch post | 2026-04-23 |
| 2 | GPT-5.4 Pro | 38% | GPT-5.4 Pro openai-gpt-5.4-pro | Launch post | 2026-04-23 |
| 3 | GPT-5.5 | 35.4% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 4 | GPT-5.4 | 27.1% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
| 5 | Claude Opus 4.7 | 22.9% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-23 |
| 6 | Gemini 3.1 Pro Preview | 16.7% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
No matching rows.