o1

o-series / OpenAI

54scores
50benchmarks
$15 / $60 per 1M tokenscost in/out

Metadata

o-series Closed/API

Aliases: o1, o1-2024-12-17, openai-o1, openai-o1-2024-12-17, openai/o1, openai/o1-2024-12-17

Benchmark Results

Benchmark Category Rank Score Sampled
Tau2-Bench Telecom Agentic 157 62.6% 2026-05-11
Terminal-Bench Hard Agentic 192 12.9% 2026-05-11
OpenUGI Alignment 299 42.25 2026-05-06
OpenUGI Alignment 481 37.24 2026-05-06
TextClass Benchmark Classification 6 1768.81 2026-05-06
BigCodeBench-Hard Coding 2 32.40 2026-05-05
BigCodeBench-Hard Coding 13 29.70 2026-05-05
BigCodeBench-Hard Coding 20 28.40 2026-05-05
CadEval Coding 4 56 2026-05-06
LiveCodeBench Coding 87 50.264% 2026-05-28
SciCode Coding 182 35.8% 2026-05-11
GSMA Open Telco Leaderboard Domain 13 68.08 2026-05-06
TaxEval v2 Finance 22 74.284% 2026-05-28
BenchLM General Knowledge 55 58 2026-05-06
Arena-Hard Generalization 11 55.9% 2026-05-27
HELM AIR-Bench Generalization 23 0.799614 2026-05-28
HELM Safety Generalization 4 0.975800 2026-05-28
WeirdML Generalization 8 47.56 2026-05-06
HealthBench Healthcare 3 0.4200 2026-05-27
MedQA Healthcare 1 96.517% 2026-04-16
HUMAINE Human Preference 30 3.44 2026-05-06
AIIQ Composite IQ Intelligence 36 91 2026-05-12
Artificial Analysis Intelligence Index Intelligence 143 30.75 2026-05-11
GPQA Diamond Intelligence 59 73.232% 2026-05-28
Humanity's Last Exam Intelligence 196 7.7% 2026-05-11
MathVision Intelligence 39 60.30 2026-05-06
MathVista Intelligence 8 73.90 2026-05-06
MMLU Pro Intelligence 46 83.488% 2026-05-28
MMLU-Pro Intelligence 42 84.1% 2026-05-11
MMMU Pro Intelligence 33 77.412% 2026-05-28
SimpleQA Intelligence 5 42.6% 2026-05-27
SuperGPQA Intelligence 2 60.24 2026-05-06
AraGen v3 Language 1 84.29 2026-05-06
HindiGen v1 Language 2 79.64 2026-05-06
LegalBench Legal 54 80.393% 2026-05-28
Fiction.LiveBench Long Context 11 53.10 2026-05-06
AIME Math 53 71.458% 2026-04-16
IneqMath Math 22 8 2026-05-06
IneqMath Math 23 7.50 2026-05-06
MATH 500 Math 25 90.4% 2026-01-09
MGSM Math 49 89.309% 2026-01-09
FrontierMath 2025-02-28 Private Mathematics 11 9.31 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 15 73.33 2026-05-06
Visual-Language Understanding Multimodal 23 45.25 2026-05-06
VPCT Multimodal 10 37 2026-05-06
EnigmaEval Reasoning 13 5.65 2026-05-06
GPQA Diamond Reasoning 164 74.7% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 34 7.75 2026-05-06
SimpleBench Reasoning 8 41.70 2026-05-06
ZebraLogic Reasoning 3 81 2026-05-06
X-Risks Leaderboard Safety 1 29.09 2026-05-06
CritPt Science 145 0.3% 2026-05-11
SWE-Lancer Software Engineering 1 28.4% 2025-07-17
Lech Mazur Writing Writing 23 7.02 2026-05-06