GPT-5 Mini | BenchmarkList

Metadata

GPT Closed/API

Aliases: gpt-5-mini, gpt-5-mini-2025-08-07, openai-gpt-5-mini, openai-gpt-5-mini-2025-08-07, openai/gpt-5-mini, openai/gpt-5-mini-2025-08-07

Benchmark	Category	Rank	Score	Sampled
AMA-Bench	Agentic	2	0.67	2026-05-06
ARC-AGI-1	Agentic	62	54.33	2026-05-05
ARC-AGI-1	Agentic	81	37.33	2026-05-05
ARC-AGI-1	Agentic	100	26.33	2026-05-05
ARC-AGI-1	Agentic	135	5.33	2026-05-05
ARC-AGI-2	Agentic	71	4.44	2026-05-05
ARC-AGI-2	Agentic	74	4.03	2026-05-05
ARC-AGI-2	Agentic	102	1.67	2026-05-05
ARC-AGI-2	Agentic	121	0.83	2026-05-05
Berkeley Function-Calling Leaderboard	Agentic	17	55.46%	2026-05-27
Berkeley Function-Calling Leaderboard	Agentic	77	27.83%	2026-05-27
EnterpriseOps-Gym	Agentic	18	20.6%	2026-05-05
Hindsight LLM Memory Leaderboard	Agentic	1	89.70	2026-05-06
LLM-WikiRace	Agentic	10	46	2026-05-06
MCPMark	Agentic	11	0.30	2026-05-06
MCPMark	Agentic	17	0.27	2026-05-06
MCPMark	Agentic	32	0.08	2026-05-06
MultiChallenge	Agentic	5	58.99	2026-05-06
PinchBench	Agentic	40	0.80	2026-05-06
Tau2-Bench Telecom	Agentic	133	71.1%	2026-05-11
Tau2-Bench Telecom	Agentic	142	68.4%	2026-05-11
Tau2-Bench Telecom	Agentic	231	31.9%	2026-05-11
Terminal-Bench Hard	Agentic	74	33.3%	2026-05-11
Terminal-Bench Hard	Agentic	102	28.8%	2026-05-11
Terminal-Bench Hard	Agentic	182	14.4%	2026-05-11
Vending-Bench 2	Agentic	41	-31.18	2026-05-28
ALE-Bench	Coding	40	799.77	2026-05-06
IOI	Coding	32	6.75%	2026-05-26
LiveCodeBench	Coding	9	86.605%	2026-05-28
SciCode	Coding	78	41%	2026-05-11
SciCode	Coding	115	39.2%	2026-05-11
SciCode	Coding	155	36.9%	2026-05-11
SWE-bench Verified	Coding	42	60.8%	2026-05-28
Terminal-Bench 2.0	Coding	47	26.966%	2026-05-28
Vibe Code Bench v1.1	Coding	32	14.171%	2026-05-28
MMTU	Data	3	0.67	2026-05-06
GSMA Open Telco Leaderboard	Domain	46	50.20	2026-05-06
SAGE	Education	25	42.988%	2026-05-28
From Perception to Action	Embodied AI	7	11%	2026-05-28
Vectara HHEM Hallucination Leaderboard	Factuality	81	87.10	2026-05-06
CorpFin v2	Finance	49	60.179%	2026-05-28
Finance Agent v1.1	Finance	26	51.928%	2026-05-04
FinChain	Finance	6	57.38 ChainEval	2026-05-28
MortgageTax	Finance	21	66.892%	2026-05-28
TaxEval v2	Finance	10	75.225%	2026-05-28
MageBench Season 1	Game	32	1516 rating / 8 games	2026-05-28
Xent Games	Game	8	49.22 overall	2026-05-28
HELM AIR-Bench	Generalization	13	0.857130	2026-05-28
HELM MedQA	Healthcare	2	0.956262	2026-05-28
MedCode	Healthcare	21	43.045%	2026-05-28
MedQA	Healthcare	6	96.058%	2026-04-16
MedScribe	Healthcare	19	80.577%	2026-05-28
PlaceboBench	Healthcare	6	39.1304	2026-05-27
HUMAINE	Human Preference	15	3.63	2026-05-06
Artificial Analysis Intelligence Index	Intelligence	69	41.17	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	83	38.94	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	236	20.68	2026-05-11
GPQA Diamond	Intelligence	40	80.303%	2026-05-28
Humanity's Last Exam	Intelligence	71	19.7%	2026-05-11
Humanity's Last Exam	Intelligence	100	14.6%	2026-05-11
Humanity's Last Exam	Intelligence	304	5%	2026-05-11
LiveBench	Intelligence	42	66.60	2026-05-05
MathVision	Intelligence	25	71.90	2026-05-06
MMLU Pro	Intelligence	51	82.226%	2026-05-28
MMLU-Pro	Intelligence	50	83.7%	2026-05-11
MMLU-Pro	Intelligence	63	82.8%	2026-05-11
MMLU-Pro	Intelligence	148	77.5%	2026-05-11
MMMU Pro	Intelligence	31	78.914%	2026-05-28
Seneca-TRBench	Language	3	92.40	2026-05-06
CaseLaw v2	Legal	4	68.489%	2026-05-04
LegalBench	Legal	48	81.77%	2026-05-28
LEXam	Legal	5	60.32% open / 54.82% MCQ	2026-05-28
AIME	Math	24	91.458%	2026-04-16
AIME 2025	Math	23	90.7%	2026-05-11
AIME 2025	Math	47	85%	2026-05-11
AIME 2025	Math	146	46.7%	2026-05-11
IneqMath	Math	7	30.50	2026-05-06
MATH 500	Math	7	94.8%	2026-01-09
MGSM	Math	16	92.582%	2026-01-09
ProofBench	Math	25	9%	2026-05-28
HMMT 2025	Mathematics	19	0.88	2026-05-06
MedSafe-Dx	Medical	9	84.8	2026-05-27
Design Arena	Multimodal	72	1177	2026-05-06
IDP Leaderboard	Multimodal	13	75.23	2026-05-06
Visual-Language Understanding	Multimodal	3	50.39	2026-05-06
Artificial Analysis Openness Index	Openness	222	5.56	2026-05-11
Artificial Analysis Openness Index	Openness	223	5.56	2026-05-11
Artificial Analysis Openness Index	Openness	224	5.56	2026-05-11
CAIS Text Capabilities Index	Reasoning	31	14.3	2026-05-27
EnigmaEval	Reasoning	9	8.19	2026-05-06
GPQA Diamond	Reasoning	77	82.8%	2026-05-11
GPQA Diamond	Reasoning	100	80.3%	2026-05-11
GPQA Diamond	Reasoning	215	68.7%	2026-05-11
Humanity's Last Exam (Text Only)	Reasoning	12	19.74	2026-05-06
MultiNRC	Reasoning	22	23.89	2026-05-06
CAIS Risk Index	Safety	18	51.1	2026-05-27
InvisibleBench	Safety	1	0	2026-05-06
CritPt	Science	72	1.4%	2026-05-11
CritPt	Science	220	0%	2026-05-11
CritPt	Science	221	0%	2026-05-11
ProgramBench	Software Engineering	9	0%	2026-05-05
SWT-Bench	Software Engineering	5	69.7%	2026-05-27
SWT-Bench	Software Engineering	6	62.4%	2026-05-27
SWT-Bench	Software Engineering	8	56.2%	2026-05-27
Structured Output Benchmark	Structured Output	20	83.50	2026-05-06
CAIS Vision Capabilities Index	Vision	11	53.6	2026-05-27

Metadata

Benchmark Results