o1
o-series / OpenAI
54scores
50benchmarks
$15 / $60 per 1M tokenscost in/out
Metadata
o-series Closed/API
Aliases: o1, o1-2024-12-17, openai-o1, openai-o1-2024-12-17, openai/o1, openai/o1-2024-12-17
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| Tau2-Bench Telecom | Agentic | 157 | 62.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 192 | 12.9% | 2026-05-11 |
| OpenUGI | Alignment | 299 | 42.25 | 2026-05-06 |
| OpenUGI | Alignment | 481 | 37.24 | 2026-05-06 |
| TextClass Benchmark | Classification | 6 | 1768.81 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 2 | 32.40 | 2026-05-05 |
| BigCodeBench-Hard | Coding | 13 | 29.70 | 2026-05-05 |
| BigCodeBench-Hard | Coding | 20 | 28.40 | 2026-05-05 |
| CadEval | Coding | 4 | 56 | 2026-05-06 |
| LiveCodeBench | Coding | 87 | 50.264% | 2026-05-28 |
| SciCode | Coding | 182 | 35.8% | 2026-05-11 |
| GSMA Open Telco Leaderboard | Domain | 13 | 68.08 | 2026-05-06 |
| TaxEval v2 | Finance | 22 | 74.284% | 2026-05-28 |
| BenchLM | General Knowledge | 55 | 58 | 2026-05-06 |
| Arena-Hard | Generalization | 11 | 55.9% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 23 | 0.799614 | 2026-05-28 |
| HELM Safety | Generalization | 4 | 0.975800 | 2026-05-28 |
| WeirdML | Generalization | 8 | 47.56 | 2026-05-06 |
| HealthBench | Healthcare | 3 | 0.4200 | 2026-05-27 |
| MedQA | Healthcare | 1 | 96.517% | 2026-04-16 |
| HUMAINE | Human Preference | 30 | 3.44 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 36 | 91 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 143 | 30.75 | 2026-05-11 |
| GPQA Diamond | Intelligence | 59 | 73.232% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 196 | 7.7% | 2026-05-11 |
| MathVision | Intelligence | 39 | 60.30 | 2026-05-06 |
| MathVista | Intelligence | 8 | 73.90 | 2026-05-06 |
| MMLU Pro | Intelligence | 46 | 83.488% | 2026-05-28 |
| MMLU-Pro | Intelligence | 42 | 84.1% | 2026-05-11 |
| MMMU Pro | Intelligence | 33 | 77.412% | 2026-05-28 |
| SimpleQA | Intelligence | 5 | 42.6% | 2026-05-27 |
| SuperGPQA | Intelligence | 2 | 60.24 | 2026-05-06 |
| AraGen v3 | Language | 1 | 84.29 | 2026-05-06 |
| HindiGen v1 | Language | 2 | 79.64 | 2026-05-06 |
| LegalBench | Legal | 54 | 80.393% | 2026-05-28 |
| Fiction.LiveBench | Long Context | 11 | 53.10 | 2026-05-06 |
| AIME | Math | 53 | 71.458% | 2026-04-16 |
| IneqMath | Math | 22 | 8 | 2026-05-06 |
| IneqMath | Math | 23 | 7.50 | 2026-05-06 |
| MATH 500 | Math | 25 | 90.4% | 2026-01-09 |
| MGSM | Math | 49 | 89.309% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 11 | 9.31 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 15 | 73.33 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 23 | 45.25 | 2026-05-06 |
| VPCT | Multimodal | 10 | 37 | 2026-05-06 |
| EnigmaEval | Reasoning | 13 | 5.65 | 2026-05-06 |
| GPQA Diamond | Reasoning | 164 | 74.7% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 34 | 7.75 | 2026-05-06 |
| SimpleBench | Reasoning | 8 | 41.70 | 2026-05-06 |
| ZebraLogic | Reasoning | 3 | 81 | 2026-05-06 |
| X-Risks Leaderboard | Safety | 1 | 29.09 | 2026-05-06 |
| CritPt | Science | 145 | 0.3% | 2026-05-11 |
| SWE-Lancer | Software Engineering | 1 | 28.4% | 2025-07-17 |
| Lech Mazur Writing | Writing | 23 | 7.02 | 2026-05-06 |
No matching rows.