o3-mini
o-series / OpenAI
66scores
51benchmarks
$1.1 / $4.4 per 1M tokenscost in/out
Metadata
o-series Closed/API
Aliases: o3-mini, o3-mini-2025-01-31, openai-o3-mini, openai-o3-mini-2025-01-31, openai/o3-mini, openai/o3-mini-2025-01-31
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 106 | 22.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 120 | 14.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 91 | 2.08 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 137 | 0 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 249 | 28.7% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 245 | 6.8% | 2026-05-11 |
| AgentBench FC | Agents | 17 | 40.90 | 2026-05-06 |
| TextClass Benchmark | Classification | 20 | 1684.99 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 1 | 33.10 | 2026-05-05 |
| BigCodeBench-Hard | Coding | 3 | 32.40 | 2026-05-05 |
| BigCodeBench-Hard | Coding | 9 | 31.10 | 2026-05-05 |
| LiveCodeBench | Coding | 15 | 63 | 2026-05-06 |
| LiveCodeBench | Coding | 18 | 57 | 2026-05-06 |
| LiveCodeBench | Coding | 58 | 71.484% | 2026-05-28 |
| Natural Language to Mongosh | Coding | 21 | 0.85 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 23 | 0.85 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 27 | 0.84 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 31 | 0.84 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 32 | 0.84 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 33 | 0.84 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 34 | 0.84 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 51 | 0.82 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 61 | 0.80 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 65 | 0.80 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 66 | 0.80 | 2026-05-06 |
| SciCode | Coding | 99 | 39.9% | 2026-05-11 |
| AIRTBench | Cybersecurity | 4 | 28.43 | 2026-05-06 |
| CorpFin v2 | Finance | 93 | 45.299% | 2026-05-28 |
| TaxEval v2 | Finance | 70 | 69.42% | 2026-05-28 |
| BenchLM | General Knowledge | 57 | 56 | 2026-05-06 |
| Arena-Hard | Generalization | 13 | 50.0% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 29 | 0.748858 | 2026-05-28 |
| HELM Safety | Generalization | 12 | 0.961961 | 2026-05-28 |
| HELM MedQA | Healthcare | 5 | 0.920477 | 2026-05-28 |
| MedAgentBench | Healthcare | 6 | 51.67% | 2026-05-27 |
| MedQA | Healthcare | 14 | 94.833% | 2026-04-16 |
| HUMAINE | Human Preference | 35 | 3.38 | 2026-05-06 |
| Multi-IF | Instruction Following | 2 | 0.80 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 186 | 25.86 | 2026-05-11 |
| GPQA Diamond | Intelligence | 53 | 75.505% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 176 | 8.7% | 2026-05-11 |
| MMLU Pro | Intelligence | 73 | 78.689% | 2026-05-28 |
| MMLU-Pro | Intelligence | 129 | 79.1% | 2026-05-11 |
| SimpleQA | Intelligence | 19 | 13.4% | 2026-05-27 |
| SuperGPQA | Intelligence | 6 | 52.69 | 2026-05-06 |
| AraGen v3 | Language | 17 | 59.81 | 2026-05-06 |
| HindiGen v1 | Language | 22 | 55.14 | 2026-05-06 |
| LegalBench | Legal | 84 | 71.539% | 2026-05-28 |
| LEXam | Legal | 16 | 48.13% open / 44.22% MCQ | 2026-05-28 |
| OpenAI-MRCR: 2 needle 128k | Long Context | 9 | 0.19 | 2026-05-06 |
| AIME | Math | 32 | 86.458% | 2026-04-16 |
| IneqMath | Math | 19 | 9.50 | 2026-05-06 |
| MATH 500 | Math | 20 | 91.8% | 2026-01-09 |
| MGSM | Math | 29 | 91.346% | 2026-01-09 |
| MedHELM | Medical | 2 | 0.6410714285714286 | 2026-05-27 |
| GPQA Diamond | Reasoning | 163 | 74.8% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 9 | 0.51 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 7 | 0.58 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 27 | 10.31 | 2026-05-06 |
| LingOly-TOO | Reasoning | 13 | 0.12 | 2026-05-06 |
| ZebraLogic | Reasoning | 2 | 88.90 | 2026-05-06 |
| X-Risks Leaderboard | Safety | 2 | 27.73 | 2026-05-06 |
| SciPredict | Science | 4 | 19.84 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 10 | 31.15 | 2026-05-06 |
| ComplexFuncBench | Tool Use | 5 | 0.18 | 2026-05-06 |
| COLLIE | Writing | 2 | 0.99 | 2026-05-06 |
No matching rows.