GPT-4o-mini
GPT / OpenAI
76scores
55benchmarks
$0.15 / $0.6 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-4o-mini, openai-gpt-4o-mini, openai/gpt-4o-mini
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| Hindsight LLM Memory Leaderboard | Agentic | 17 | 81 | 2026-05-06 |
| PinchBench | Agentic | 50 | 0.75 | 2026-05-06 |
| RealDataAgentBench | Agentic | 9 | 0.78 | 2026-04-28 |
| TERMS-Bench | Agentic | 15 | 18.9% SE+ | 2026-05-28 |
| Speech Arena | Audio | 2 | 1593 | 2026-05-06 |
| TextClass Benchmark | Classification | 25 | 1674.64 | 2026-05-06 |
| EvalPlus | Coding | 9 | 77.85 | 2026-05-05 |
| HumanEval+ | Coding | 8 | 83.50 | 2026-05-05 |
| MBPP+ | Coding | 12 | 72.20 | 2026-05-05 |
| Natural Language to Mongosh | Coding | 39 | 0.83 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 42 | 0.83 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 56 | 0.81 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 57 | 0.81 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 62 | 0.80 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 72 | 0.79 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 74 | 0.79 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 78 | 0.79 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 81 | 0.78 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 83 | 0.78 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 87 | 0.77 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 88 | 0.77 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 90 | 0.77 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 93 | 0.75 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 94 | 0.75 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 96 | 0.74 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 97 | 0.74 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 100 | 0.73 | 2026-05-06 |
| SciCode | Coding | 345 | 22.9% | 2026-05-11 |
| MMTU | Data | 21 | 0.40 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 42 | 51.25 | 2026-05-06 |
| RoboBench | Embodied | 11 | 34.40 | 2026-05-27 |
| FinEval | Finance | 24 | 66.2 | 2026-05-27 |
| Open FinLLM Leaderboard | Finance | 11 | 28.32187% | 2026-05-27 |
| SECQUE | Finance | 3 | 0.64 | 2026-05-28 |
| MageBench Season 1 | Game | 28 | 1546 rating / 4 games | 2026-05-28 |
| BenchLM | General Knowledge | 63 | 50 | 2026-05-06 |
| MixEval Chat | General Knowledge | 18 | 51.60 | 2026-05-06 |
| AgentHarm | Generalization | 28 | 62.5% | 2026-05-27 |
| AgentHarm | Generalization | 30 | 68.4% | 2026-05-27 |
| AgentHarm | Generalization | 32 | 68.8% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 62 | 0.562610 | 2026-05-28 |
| HELM Safety | Generalization | 24 | 0.930425 | 2026-05-28 |
| LongBench v2 | Generalization | 33 | 32.4% | 2026-05-27 |
| WildBench | Generalization | 3 | 7.86328125 | 2026-05-27 |
| CHOICE | Geospatial | 12 | 0.6133 | 2026-05-27 |
| GeoCode Leaderboard | Geospatial | 18 | 55.02% pass@1 | 2026-05-28 |
| HealthBench Hard | Healthcare | 31 | 0.33 | 2026-05-27 |
| HELM MedQA | Healthcare | 13 | 0.749503 | 2026-05-28 |
| MedAgentBench | Healthcare | 5 | 56.33% | 2026-05-27 |
| Artificial Analysis Intelligence Index | Intelligence | 377 | 12.65 | 2026-05-11 |
| HELM Lite | Intelligence | 14 | 0.756818 | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 412 | 4% | 2026-05-11 |
| MMLU-Pro | Intelligence | 253 | 64.8% | 2026-05-11 |
| SimpleQA | Intelligence | 21 | 9.5% | 2026-05-27 |
| OpenHuEval | Language | 6 | 49.33 | 2026-05-06 |
| LEXam | Legal | 23 | 42.55% open / 40.96% MCQ | 2026-05-28 |
| AIME 2025 | Math | 219 | 14.7% | 2026-05-11 |
| IneqMath | Math | 46 | 2 | 2026-05-06 |
| MedHELM | Medical | 7 | 0.39285714285714285 | 2026-05-27 |
| MEDIC Benchmark | Medical | 95 | 19 average normalized public table score | 2026-05-27 |
| MedSafe-Dx | Medical | 4 | 90.4 | 2026-05-27 |
| LanguageBench | Multilingual | 13 | 0.55 | 2026-05-06 |
| MMMU-Pro | Multimodal | 50 | 37.60 | 2026-05-06 |
| Video-MME | Multimodal | 24 | 68.90 | 2026-05-06 |
| GPQA Diamond | Reasoning | 382 | 42.6% | 2026-05-11 |
| AgentLeak | Safety | 2 | 76.30 | 2026-05-06 |
| Halluverse-M3 | Safety | 6 | 73.39% | 2026-05-28 |
| ChemBench | Science | 28 | 0.50 | 2026-05-06 |
| SWE-PRBench | Software Engineering | 6 | 0.108 | 2026-05-27 |
| SWT-Bench | Software Engineering | 28 | 9.8% | 2026-05-27 |
| JSONSchemaBench | Structured Output | 2 | 95.8% schema compliance | 2026-05-28 |
| JSONSchemaBench | Structured Output | 13 | 86.2% schema compliance | 2026-05-28 |
| JSONSchemaBench | Structured Output | 24 | 68.5% schema compliance | 2026-05-28 |
| StructEval | Structured Output | 4 | 73.19% | 2026-05-28 |
| Generate README Eval | Summarization | 11 | 32.16 | 2026-05-06 |
| VNTL Leaderboard | Translation | 6 | 72.23 | 2026-05-06 |
No matching rows.