GPT-5 Mini

GPT / OpenAI

106scores
75benchmarks
$0.25 / $2 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-5-mini, gpt-5-mini-2025-08-07, openai-gpt-5-mini, openai-gpt-5-mini-2025-08-07, openai/gpt-5-mini, openai/gpt-5-mini-2025-08-07

Benchmark Results

Benchmark Category Rank Score Sampled
AMA-Bench Agentic 2 0.67 2026-05-06
ARC-AGI-1 Agentic 62 54.33 2026-05-05
ARC-AGI-1 Agentic 81 37.33 2026-05-05
ARC-AGI-1 Agentic 100 26.33 2026-05-05
ARC-AGI-1 Agentic 135 5.33 2026-05-05
ARC-AGI-2 Agentic 71 4.44 2026-05-05
ARC-AGI-2 Agentic 74 4.03 2026-05-05
ARC-AGI-2 Agentic 102 1.67 2026-05-05
ARC-AGI-2 Agentic 121 0.83 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 17 55.46% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 77 27.83% 2026-05-27
EnterpriseOps-Gym Agentic 18 20.6% 2026-05-05
Hindsight LLM Memory Leaderboard Agentic 1 89.70 2026-05-06
LLM-WikiRace Agentic 10 46 2026-05-06
MCPMark Agentic 11 0.30 2026-05-06
MCPMark Agentic 17 0.27 2026-05-06
MCPMark Agentic 32 0.08 2026-05-06
MultiChallenge Agentic 5 58.99 2026-05-06
PinchBench Agentic 40 0.80 2026-05-06
Tau2-Bench Telecom Agentic 133 71.1% 2026-05-11
Tau2-Bench Telecom Agentic 142 68.4% 2026-05-11
Tau2-Bench Telecom Agentic 231 31.9% 2026-05-11
Terminal-Bench Hard Agentic 74 33.3% 2026-05-11
Terminal-Bench Hard Agentic 102 28.8% 2026-05-11
Terminal-Bench Hard Agentic 182 14.4% 2026-05-11
Vending-Bench 2 Agentic 41 -31.18 2026-05-28
ALE-Bench Coding 40 799.77 2026-05-06
IOI Coding 32 6.75% 2026-05-26
LiveCodeBench Coding 9 86.605% 2026-05-28
SciCode Coding 78 41% 2026-05-11
SciCode Coding 115 39.2% 2026-05-11
SciCode Coding 155 36.9% 2026-05-11
SWE-bench Verified Coding 42 60.8% 2026-05-28
Terminal-Bench 2.0 Coding 47 26.966% 2026-05-28
Vibe Code Bench v1.1 Coding 32 14.171% 2026-05-28
MMTU Data 3 0.67 2026-05-06
GSMA Open Telco Leaderboard Domain 46 50.20 2026-05-06
SAGE Education 25 42.988% 2026-05-28
From Perception to Action Embodied AI 7 11% 2026-05-28
Vectara HHEM Hallucination Leaderboard Factuality 81 87.10 2026-05-06
CorpFin v2 Finance 49 60.179% 2026-05-28
Finance Agent v1.1 Finance 26 51.928% 2026-05-04
FinChain Finance 6 57.38 ChainEval 2026-05-28
MortgageTax Finance 21 66.892% 2026-05-28
TaxEval v2 Finance 10 75.225% 2026-05-28
MageBench Season 1 Game 32 1516 rating / 8 games 2026-05-28
Xent Games Game 8 49.22 overall 2026-05-28
HELM AIR-Bench Generalization 13 0.857130 2026-05-28
HELM MedQA Healthcare 2 0.956262 2026-05-28
MedCode Healthcare 21 43.045% 2026-05-28
MedQA Healthcare 6 96.058% 2026-04-16
MedScribe Healthcare 19 80.577% 2026-05-28
PlaceboBench Healthcare 6 39.1304 2026-05-27
HUMAINE Human Preference 15 3.63 2026-05-06
Artificial Analysis Intelligence Index Intelligence 69 41.17 2026-05-11
Artificial Analysis Intelligence Index Intelligence 83 38.94 2026-05-11
Artificial Analysis Intelligence Index Intelligence 236 20.68 2026-05-11
GPQA Diamond Intelligence 40 80.303% 2026-05-28
Humanity's Last Exam Intelligence 71 19.7% 2026-05-11
Humanity's Last Exam Intelligence 100 14.6% 2026-05-11
Humanity's Last Exam Intelligence 304 5% 2026-05-11
LiveBench Intelligence 42 66.60 2026-05-05
MathVision Intelligence 25 71.90 2026-05-06
MMLU Pro Intelligence 51 82.226% 2026-05-28
MMLU-Pro Intelligence 50 83.7% 2026-05-11
MMLU-Pro Intelligence 63 82.8% 2026-05-11
MMLU-Pro Intelligence 148 77.5% 2026-05-11
MMMU Pro Intelligence 31 78.914% 2026-05-28
Seneca-TRBench Language 3 92.40 2026-05-06
CaseLaw v2 Legal 4 68.489% 2026-05-04
LegalBench Legal 48 81.77% 2026-05-28
LEXam Legal 5 60.32% open / 54.82% MCQ 2026-05-28
AIME Math 24 91.458% 2026-04-16
AIME 2025 Math 23 90.7% 2026-05-11
AIME 2025 Math 47 85% 2026-05-11
AIME 2025 Math 146 46.7% 2026-05-11
IneqMath Math 7 30.50 2026-05-06
MATH 500 Math 7 94.8% 2026-01-09
MGSM Math 16 92.582% 2026-01-09
ProofBench Math 25 9% 2026-05-28
HMMT 2025 Mathematics 19 0.88 2026-05-06
MedSafe-Dx Medical 9 84.8 2026-05-27
Design Arena Multimodal 72 1177 2026-05-06
IDP Leaderboard Multimodal 13 75.23 2026-05-06
Visual-Language Understanding Multimodal 3 50.39 2026-05-06
Artificial Analysis Openness Index Openness 222 5.56 2026-05-11
Artificial Analysis Openness Index Openness 223 5.56 2026-05-11
Artificial Analysis Openness Index Openness 224 5.56 2026-05-11
CAIS Text Capabilities Index Reasoning 31 14.3 2026-05-27
EnigmaEval Reasoning 9 8.19 2026-05-06
GPQA Diamond Reasoning 77 82.8% 2026-05-11
GPQA Diamond Reasoning 100 80.3% 2026-05-11
GPQA Diamond Reasoning 215 68.7% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 12 19.74 2026-05-06
MultiNRC Reasoning 22 23.89 2026-05-06
CAIS Risk Index Safety 18 51.1 2026-05-27
InvisibleBench Safety 1 0 2026-05-06
CritPt Science 72 1.4% 2026-05-11
CritPt Science 220 0% 2026-05-11
CritPt Science 221 0% 2026-05-11
ProgramBench Software Engineering 9 0% 2026-05-05
SWT-Bench Software Engineering 5 69.7% 2026-05-27
SWT-Bench Software Engineering 6 62.4% 2026-05-27
SWT-Bench Software Engineering 8 56.2% 2026-05-27
Structured Output Benchmark Structured Output 20 83.50 2026-05-06
CAIS Vision Capabilities Index Vision 11 53.6 2026-05-27