o3 Mini High
o-series / OpenAI
21scores
21benchmarks
$1.1 / $4.4 per 1M tokenscost in/out
Metadata
o-series Closed/API
Aliases: o3-mini-high, o3-mini-high-2025-01-31, openai-o3-mini-high, openai-o3-mini-high-2025-01-31, openai/o3-mini-high, openai/o3-mini-high-2025-01-31
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 85 | 34.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 80 | 2.99 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 237 | 31.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 256 | 6.1% | 2026-05-11 |
| LiveCodeBench | Coding | 9 | 67.40 | 2026-05-06 |
| SciCode | Coding | 103 | 39.8% | 2026-05-11 |
| Arena-Hard | Generalization | 6 | 66.1% | 2026-05-27 |
| Artificial Analysis Intelligence Index | Intelligence | 190 | 25.21 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 122 | 12.3% | 2026-05-11 |
| MMLU-Pro | Intelligence | 112 | 80.2% | 2026-05-11 |
| SimpleQA | Intelligence | 18 | 13.8% | 2026-05-27 |
| SuperGPQA | Intelligence | 4 | 55.22 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 36 | 7.8 | 2026-05-27 |
| GPQA Diamond | Reasoning | 130 | 77.3% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 20 | 13.37 | 2026-05-06 |
| LingOly-TOO | Reasoning | 7 | 0.31 | 2026-05-06 |
| ZebraLogic | Reasoning | 1 | 91.70 | 2026-05-06 |
| CAIS Risk Index | Safety | 32 | 60.1 | 2026-05-27 |
| CritPt | Science | 146 | 0.3% | 2026-05-11 |
| Defects4J | Software Engineering | 2 | 0.488 | 2026-05-27 |
| RepairBench | Software Engineering | 2 | 0.464 | 2026-05-27 |
No matching rows.