o4 Mini
o-series / OpenAI
75scores
64benchmarks
$1.1 / $4.4 per 1M tokenscost in/out
Metadata
o-series Closed/API
Aliases: o4-mini, o4-mini-2025-04-16, openai-o4-mini, openai-o4-mini-2025-04-16, openai/o4-mini, openai/o4-mini-2025-04-16
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 52 | 58.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 73 | 41.83 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 108 | 21.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 61 | 6.11 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 87 | 2.36 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 101 | 1.67 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 21 | 53.24% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 28 | 50.26% | 2026-05-27 |
| DEEPSYNTH | Agentic | 11 | 3.05 | 2026-05-27 |
| MCPMark | Agentic | 26 | 0.17 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 165 | 55.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 176 | 15.2% | 2026-05-11 |
| VitaBench | Agentic | 9 | 19.50 | 2026-05-06 |
| AgentBench FC | Agents | 19 | 39.70 | 2026-05-06 |
| OpenUGI | Alignment | 446 | 38.21 | 2026-05-06 |
| OpenUGI | Alignment | 495 | 36.83 | 2026-05-06 |
| OpenUGI | Alignment | 543 | 35.60 | 2026-05-06 |
| TextClass Benchmark | Classification | 58 | 1538.33 | 2026-05-06 |
| CadEval | Coding | 3 | 62 | 2026-05-06 |
| IOI | Coding | 38 | 4.834% | 2026-05-26 |
| LiveCodeBench | Coding | 3 | 74.20 | 2026-05-06 |
| LiveCodeBench | Coding | 12 | 65.90 | 2026-05-06 |
| LiveCodeBench | Coding | 34 | 82.208% | 2026-05-28 |
| SciCode | Coding | 34 | 46.5% | 2026-05-11 |
| MMTU | Data | 5 | 0.66 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 26 | 63.07 | 2026-05-06 |
| SAGE | Education | 30 | 41.061% | 2026-05-28 |
| CorpFin v2 | Finance | 58 | 58.974% | 2026-05-28 |
| FinanceArena | Finance | 3 | 48.6 | 2026-05-27 |
| MortgageTax | Finance | 29 | 64.826% | 2026-05-28 |
| PRBench Finance | Finance | 16 | 39.22 | 2026-05-06 |
| TaxEval v2 | Finance | 15 | 74.776% | 2026-05-28 |
| Arena-Hard | Generalization | 4 | 74.6% | 2026-05-27 |
| GDPval | Generalization | 4 | 29.1% | 2025-09-25 |
| HELM AIR-Bench | Generalization | 26 | 0.784861 | 2026-05-28 |
| HELM Safety | Generalization | 6 | 0.973247 | 2026-05-28 |
| HELM MedQA | Healthcare | 3 | 0.948310 | 2026-05-28 |
| MedCode | Healthcare | 45 | 33.791% | 2026-05-28 |
| MedQA | Healthcare | 9 | 96.017% | 2026-04-16 |
| MedScribe | Healthcare | 52 | 69.139% | 2026-05-28 |
| HUMAINE | Human Preference | 18 | 3.58 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 121 | 33.06 | 2026-05-11 |
| GPQA Diamond | Intelligence | 56 | 74.495% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 83 | 17.5% | 2026-05-11 |
| MathVision | Intelligence | 44 | 58 | 2026-05-06 |
| MMLU Pro | Intelligence | 58 | 80.561% | 2026-05-28 |
| MMLU-Pro | Intelligence | 58 | 83.2% | 2026-05-11 |
| MMMU Pro | Intelligence | 27 | 79.665% | 2026-05-28 |
| AraGen v3 | Language | 11 | 70.60 | 2026-05-06 |
| HindiGen v1 | Language | 4 | 75.52 | 2026-05-06 |
| LegalBench | Legal | 65 | 79.185% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 18 | 38.11 | 2026-05-06 |
| Fiction.LiveBench | Long Context | 9 | 62.50 | 2026-05-06 |
| AIME | Math | 41 | 83.667% | 2026-04-16 |
| AIME 2025 | Math | 24 | 90.7% | 2026-05-11 |
| IneqMath | Math | 14 | 15.50 | 2026-05-06 |
| MATH 500 | Math | 12 | 94.2% | 2026-01-09 |
| MGSM | Math | 9 | 93.418% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 8 | 18.97 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 10 | 2.08 | 2026-05-06 |
| MEDIC Benchmark | Medical | 3 | 90.5 average normalized public table score | 2026-05-27 |
| CharXiv-R | Multimodal | 19 | 0.72 | 2026-05-06 |
| Video SimpleQA | Multimodal | 5 | 54 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 3 | 51.79 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 3 | 51.66 | 2026-05-06 |
| VPCT | Multimodal | 3 | 57.50 | 2026-05-06 |
| VTB | Multimodal | 8 | 11.12 | 2026-05-06 |
| EnigmaEval | Reasoning | 7 | 9.21 | 2026-05-06 |
| EnigmaEval | Reasoning | 12 | 6.81 | 2026-05-06 |
| GPQA Diamond | Reasoning | 118 | 78.4% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 12 | 18.90 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 20 | 14.53 | 2026-05-06 |
| CritPt | Science | 116 | 0.6% | 2026-05-11 |
| LiveSQLBench | Text to SQL | 14 | 29.54 | 2026-05-06 |
| Lech Mazur Writing | Writing | 19 | 7.50 | 2026-05-06 |
No matching rows.