o3-mini

o-series / OpenAI

66scores
51benchmarks
$1.1 / $4.4 per 1M tokenscost in/out

Metadata

o-series Closed/API

Aliases: o3-mini, o3-mini-2025-01-31, openai-o3-mini, openai-o3-mini-2025-01-31, openai/o3-mini, openai/o3-mini-2025-01-31

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 106 22.33 2026-05-05
ARC-AGI-1 Agentic 120 14.50 2026-05-05
ARC-AGI-2 Agentic 91 2.08 2026-05-05
ARC-AGI-2 Agentic 137 0 2026-05-05
Tau2-Bench Telecom Agentic 249 28.7% 2026-05-11
Terminal-Bench Hard Agentic 245 6.8% 2026-05-11
AgentBench FC Agents 17 40.90 2026-05-06
TextClass Benchmark Classification 20 1684.99 2026-05-06
BigCodeBench-Hard Coding 1 33.10 2026-05-05
BigCodeBench-Hard Coding 3 32.40 2026-05-05
BigCodeBench-Hard Coding 9 31.10 2026-05-05
LiveCodeBench Coding 15 63 2026-05-06
LiveCodeBench Coding 18 57 2026-05-06
LiveCodeBench Coding 58 71.484% 2026-05-28
Natural Language to Mongosh Coding 21 0.85 2026-05-06
Natural Language to Mongosh Coding 23 0.85 2026-05-06
Natural Language to Mongosh Coding 27 0.84 2026-05-06
Natural Language to Mongosh Coding 31 0.84 2026-05-06
Natural Language to Mongosh Coding 32 0.84 2026-05-06
Natural Language to Mongosh Coding 33 0.84 2026-05-06
Natural Language to Mongosh Coding 34 0.84 2026-05-06
Natural Language to Mongosh Coding 51 0.82 2026-05-06
Natural Language to Mongosh Coding 61 0.80 2026-05-06
Natural Language to Mongosh Coding 65 0.80 2026-05-06
Natural Language to Mongosh Coding 66 0.80 2026-05-06
SciCode Coding 99 39.9% 2026-05-11
AIRTBench Cybersecurity 4 28.43 2026-05-06
CorpFin v2 Finance 93 45.299% 2026-05-28
TaxEval v2 Finance 70 69.42% 2026-05-28
BenchLM General Knowledge 57 56 2026-05-06
Arena-Hard Generalization 13 50.0% 2026-05-27
HELM AIR-Bench Generalization 29 0.748858 2026-05-28
HELM Safety Generalization 12 0.961961 2026-05-28
HELM MedQA Healthcare 5 0.920477 2026-05-28
MedAgentBench Healthcare 6 51.67% 2026-05-27
MedQA Healthcare 14 94.833% 2026-04-16
HUMAINE Human Preference 35 3.38 2026-05-06
Multi-IF Instruction Following 2 0.80 2026-05-06
Artificial Analysis Intelligence Index Intelligence 186 25.86 2026-05-11
GPQA Diamond Intelligence 53 75.505% 2026-05-28
Humanity's Last Exam Intelligence 176 8.7% 2026-05-11
MMLU Pro Intelligence 73 78.689% 2026-05-28
MMLU-Pro Intelligence 129 79.1% 2026-05-11
SimpleQA Intelligence 19 13.4% 2026-05-27
SuperGPQA Intelligence 6 52.69 2026-05-06
AraGen v3 Language 17 59.81 2026-05-06
HindiGen v1 Language 22 55.14 2026-05-06
LegalBench Legal 84 71.539% 2026-05-28
LEXam Legal 16 48.13% open / 44.22% MCQ 2026-05-28
OpenAI-MRCR: 2 needle 128k Long Context 9 0.19 2026-05-06
AIME Math 32 86.458% 2026-04-16
IneqMath Math 19 9.50 2026-05-06
MATH 500 Math 20 91.8% 2026-01-09
MGSM Math 29 91.346% 2026-01-09
MedHELM Medical 2 0.6410714285714286 2026-05-27
GPQA Diamond Reasoning 163 74.8% 2026-05-11
Graphwalks BFS <128k Reasoning 9 0.51 2026-05-06
Graphwalks parents <128k Reasoning 7 0.58 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 27 10.31 2026-05-06
LingOly-TOO Reasoning 13 0.12 2026-05-06
ZebraLogic Reasoning 2 88.90 2026-05-06
X-Risks Leaderboard Safety 2 27.73 2026-05-06
SciPredict Science 4 19.84 2026-05-06
LiveSQLBench Text to SQL 10 31.15 2026-05-06
ComplexFuncBench Tool Use 5 0.18 2026-05-06
COLLIE Writing 2 0.99 2026-05-06