o4 Mini

o-series / OpenAI

75scores
64benchmarks
$1.1 / $4.4 per 1M tokenscost in/out

Metadata

o-series Closed/API

Aliases: o4-mini, o4-mini-2025-04-16, openai-o4-mini, openai-o4-mini-2025-04-16, openai/o4-mini, openai/o4-mini-2025-04-16

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 52 58.67 2026-05-05
ARC-AGI-1 Agentic 73 41.83 2026-05-05
ARC-AGI-1 Agentic 108 21.33 2026-05-05
ARC-AGI-2 Agentic 61 6.11 2026-05-05
ARC-AGI-2 Agentic 87 2.36 2026-05-05
ARC-AGI-2 Agentic 101 1.67 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 21 53.24% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 28 50.26% 2026-05-27
DEEPSYNTH Agentic 11 3.05 2026-05-27
MCPMark Agentic 26 0.17 2026-05-06
Tau2-Bench Telecom Agentic 165 55.6% 2026-05-11
Terminal-Bench Hard Agentic 176 15.2% 2026-05-11
VitaBench Agentic 9 19.50 2026-05-06
AgentBench FC Agents 19 39.70 2026-05-06
OpenUGI Alignment 446 38.21 2026-05-06
OpenUGI Alignment 495 36.83 2026-05-06
OpenUGI Alignment 543 35.60 2026-05-06
TextClass Benchmark Classification 58 1538.33 2026-05-06
CadEval Coding 3 62 2026-05-06
IOI Coding 38 4.834% 2026-05-26
LiveCodeBench Coding 3 74.20 2026-05-06
LiveCodeBench Coding 12 65.90 2026-05-06
LiveCodeBench Coding 34 82.208% 2026-05-28
SciCode Coding 34 46.5% 2026-05-11
MMTU Data 5 0.66 2026-05-06
GSMA Open Telco Leaderboard Domain 26 63.07 2026-05-06
SAGE Education 30 41.061% 2026-05-28
CorpFin v2 Finance 58 58.974% 2026-05-28
FinanceArena Finance 3 48.6 2026-05-27
MortgageTax Finance 29 64.826% 2026-05-28
PRBench Finance Finance 16 39.22 2026-05-06
TaxEval v2 Finance 15 74.776% 2026-05-28
Arena-Hard Generalization 4 74.6% 2026-05-27
GDPval Generalization 4 29.1% 2025-09-25
HELM AIR-Bench Generalization 26 0.784861 2026-05-28
HELM Safety Generalization 6 0.973247 2026-05-28
HELM MedQA Healthcare 3 0.948310 2026-05-28
MedCode Healthcare 45 33.791% 2026-05-28
MedQA Healthcare 9 96.017% 2026-04-16
MedScribe Healthcare 52 69.139% 2026-05-28
HUMAINE Human Preference 18 3.58 2026-05-06
Artificial Analysis Intelligence Index Intelligence 121 33.06 2026-05-11
GPQA Diamond Intelligence 56 74.495% 2026-05-28
Humanity's Last Exam Intelligence 83 17.5% 2026-05-11
MathVision Intelligence 44 58 2026-05-06
MMLU Pro Intelligence 58 80.561% 2026-05-28
MMLU-Pro Intelligence 58 83.2% 2026-05-11
MMMU Pro Intelligence 27 79.665% 2026-05-28
AraGen v3 Language 11 70.60 2026-05-06
HindiGen v1 Language 4 75.52 2026-05-06
LegalBench Legal 65 79.185% 2026-05-28
Professional Reasoning Bench - Legal Legal 18 38.11 2026-05-06
Fiction.LiveBench Long Context 9 62.50 2026-05-06
AIME Math 41 83.667% 2026-04-16
AIME 2025 Math 24 90.7% 2026-05-11
IneqMath Math 14 15.50 2026-05-06
MATH 500 Math 12 94.2% 2026-01-09
MGSM Math 9 93.418% 2026-01-09
FrontierMath 2025-02-28 Private Mathematics 8 18.97 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 10 2.08 2026-05-06
MEDIC Benchmark Medical 3 90.5 average normalized public table score 2026-05-27
CharXiv-R Multimodal 19 0.72 2026-05-06
Video SimpleQA Multimodal 5 54 2026-05-06
Visual-Language Understanding Multimodal 3 51.79 2026-05-06
Visual-Language Understanding Multimodal 3 51.66 2026-05-06
VPCT Multimodal 3 57.50 2026-05-06
VTB Multimodal 8 11.12 2026-05-06
EnigmaEval Reasoning 7 9.21 2026-05-06
EnigmaEval Reasoning 12 6.81 2026-05-06
GPQA Diamond Reasoning 118 78.4% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 12 18.90 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 20 14.53 2026-05-06
CritPt Science 116 0.6% 2026-05-11
LiveSQLBench Text to SQL 14 29.54 2026-05-06
Lech Mazur Writing Writing 19 7.50 2026-05-06