GPT-5 Mini
GPT / OpenAI
106scores
75benchmarks
$0.25 / $2 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5-mini, gpt-5-mini-2025-08-07, openai-gpt-5-mini, openai-gpt-5-mini-2025-08-07, openai/gpt-5-mini, openai/gpt-5-mini-2025-08-07
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AMA-Bench | Agentic | 2 | 0.67 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 62 | 54.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 81 | 37.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 100 | 26.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 135 | 5.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 71 | 4.44 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 74 | 4.03 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 102 | 1.67 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 121 | 0.83 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 17 | 55.46% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 77 | 27.83% | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 18 | 20.6% | 2026-05-05 |
| Hindsight LLM Memory Leaderboard | Agentic | 1 | 89.70 | 2026-05-06 |
| LLM-WikiRace | Agentic | 10 | 46 | 2026-05-06 |
| MCPMark | Agentic | 11 | 0.30 | 2026-05-06 |
| MCPMark | Agentic | 17 | 0.27 | 2026-05-06 |
| MCPMark | Agentic | 32 | 0.08 | 2026-05-06 |
| MultiChallenge | Agentic | 5 | 58.99 | 2026-05-06 |
| PinchBench | Agentic | 40 | 0.80 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 133 | 71.1% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 142 | 68.4% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 231 | 31.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 74 | 33.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 102 | 28.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 182 | 14.4% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 41 | -31.18 | 2026-05-28 |
| ALE-Bench | Coding | 40 | 799.77 | 2026-05-06 |
| IOI | Coding | 32 | 6.75% | 2026-05-26 |
| LiveCodeBench | Coding | 9 | 86.605% | 2026-05-28 |
| SciCode | Coding | 78 | 41% | 2026-05-11 |
| SciCode | Coding | 115 | 39.2% | 2026-05-11 |
| SciCode | Coding | 155 | 36.9% | 2026-05-11 |
| SWE-bench Verified | Coding | 42 | 60.8% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 47 | 26.966% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 32 | 14.171% | 2026-05-28 |
| MMTU | Data | 3 | 0.67 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 46 | 50.20 | 2026-05-06 |
| SAGE | Education | 25 | 42.988% | 2026-05-28 |
| From Perception to Action | Embodied AI | 7 | 11% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 81 | 87.10 | 2026-05-06 |
| CorpFin v2 | Finance | 49 | 60.179% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 26 | 51.928% | 2026-05-04 |
| FinChain | Finance | 6 | 57.38 ChainEval | 2026-05-28 |
| MortgageTax | Finance | 21 | 66.892% | 2026-05-28 |
| TaxEval v2 | Finance | 10 | 75.225% | 2026-05-28 |
| MageBench Season 1 | Game | 32 | 1516 rating / 8 games | 2026-05-28 |
| Xent Games | Game | 8 | 49.22 overall | 2026-05-28 |
| HELM AIR-Bench | Generalization | 13 | 0.857130 | 2026-05-28 |
| HELM MedQA | Healthcare | 2 | 0.956262 | 2026-05-28 |
| MedCode | Healthcare | 21 | 43.045% | 2026-05-28 |
| MedQA | Healthcare | 6 | 96.058% | 2026-04-16 |
| MedScribe | Healthcare | 19 | 80.577% | 2026-05-28 |
| PlaceboBench | Healthcare | 6 | 39.1304 | 2026-05-27 |
| HUMAINE | Human Preference | 15 | 3.63 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 69 | 41.17 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 83 | 38.94 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 236 | 20.68 | 2026-05-11 |
| GPQA Diamond | Intelligence | 40 | 80.303% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 71 | 19.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 100 | 14.6% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 304 | 5% | 2026-05-11 |
| LiveBench | Intelligence | 42 | 66.60 | 2026-05-05 |
| MathVision | Intelligence | 25 | 71.90 | 2026-05-06 |
| MMLU Pro | Intelligence | 51 | 82.226% | 2026-05-28 |
| MMLU-Pro | Intelligence | 50 | 83.7% | 2026-05-11 |
| MMLU-Pro | Intelligence | 63 | 82.8% | 2026-05-11 |
| MMLU-Pro | Intelligence | 148 | 77.5% | 2026-05-11 |
| MMMU Pro | Intelligence | 31 | 78.914% | 2026-05-28 |
| Seneca-TRBench | Language | 3 | 92.40 | 2026-05-06 |
| CaseLaw v2 | Legal | 4 | 68.489% | 2026-05-04 |
| LegalBench | Legal | 48 | 81.77% | 2026-05-28 |
| LEXam | Legal | 5 | 60.32% open / 54.82% MCQ | 2026-05-28 |
| AIME | Math | 24 | 91.458% | 2026-04-16 |
| AIME 2025 | Math | 23 | 90.7% | 2026-05-11 |
| AIME 2025 | Math | 47 | 85% | 2026-05-11 |
| AIME 2025 | Math | 146 | 46.7% | 2026-05-11 |
| IneqMath | Math | 7 | 30.50 | 2026-05-06 |
| MATH 500 | Math | 7 | 94.8% | 2026-01-09 |
| MGSM | Math | 16 | 92.582% | 2026-01-09 |
| ProofBench | Math | 25 | 9% | 2026-05-28 |
| HMMT 2025 | Mathematics | 19 | 0.88 | 2026-05-06 |
| MedSafe-Dx | Medical | 9 | 84.8 | 2026-05-27 |
| Design Arena | Multimodal | 72 | 1177 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 13 | 75.23 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 3 | 50.39 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 222 | 5.56 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 223 | 5.56 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 224 | 5.56 | 2026-05-11 |
| CAIS Text Capabilities Index | Reasoning | 31 | 14.3 | 2026-05-27 |
| EnigmaEval | Reasoning | 9 | 8.19 | 2026-05-06 |
| GPQA Diamond | Reasoning | 77 | 82.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 100 | 80.3% | 2026-05-11 |
| GPQA Diamond | Reasoning | 215 | 68.7% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 12 | 19.74 | 2026-05-06 |
| MultiNRC | Reasoning | 22 | 23.89 | 2026-05-06 |
| CAIS Risk Index | Safety | 18 | 51.1 | 2026-05-27 |
| InvisibleBench | Safety | 1 | 0 | 2026-05-06 |
| CritPt | Science | 72 | 1.4% | 2026-05-11 |
| CritPt | Science | 220 | 0% | 2026-05-11 |
| CritPt | Science | 221 | 0% | 2026-05-11 |
| ProgramBench | Software Engineering | 9 | 0% | 2026-05-05 |
| SWT-Bench | Software Engineering | 5 | 69.7% | 2026-05-27 |
| SWT-Bench | Software Engineering | 6 | 62.4% | 2026-05-27 |
| SWT-Bench | Software Engineering | 8 | 56.2% | 2026-05-27 |
| Structured Output Benchmark | Structured Output | 20 | 83.50 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 11 | 53.6 | 2026-05-27 |
No matching rows.