GPT-5.4 Nano | BenchmarkList

Metadata

GPT Closed/API

Aliases: gpt-5.4-nano, gpt-5.4-nano-20260317, openai-gpt-5.4-nano, openai-gpt-5.4-nano-20260317, openai/gpt-5.4-nano, openai/gpt-5.4-nano-20260317

Benchmark	Category	Rank	Score	Sampled
APEX-Agents-AA	Agentic	8	24.9%	2026-05-11
ARC-AGI-1	Agentic	64	51.50	2026-05-05
ARC-AGI-1	Agentic	80	38.17	2026-05-05
ARC-AGI-1	Agentic	89	33	2026-05-05
ARC-AGI-1	Agentic	112	18.33	2026-05-05
ARC-AGI-2	Agentic	64	5.69	2026-05-05
ARC-AGI-2	Agentic	78	3.61	2026-05-05
ARC-AGI-2	Agentic	99	1.94	2026-05-05
ARC-AGI-2	Agentic	105	1.53	2026-05-05
AutoBench	Agentic	23	2.78	2026-05-06
Hindsight LLM Memory Leaderboard	Agentic	14	83.90	2026-05-06
ITBench-AA	Agentic	20	24.4%	2026-05-28
OSWorld-Verified	Agentic	12	0.39	2026-05-06
PinchBench	Agentic	42	0.79	2026-05-06
RuneBench	Agentic	13	2.30	2026-05-05
Tau2-Bench Telecom	Agentic	116	76%	2026-05-11
Tau2-Bench Telecom	Agentic	173	52.6%	2026-05-11
Tau2-Bench Telecom	Agentic	217	34.8%	2026-05-11
Terminal-Bench Hard	Agentic	31	42.4%	2026-05-11
Terminal-Bench Hard	Agentic	76	33.3%	2026-05-11
Terminal-Bench Hard	Agentic	125	24.2%	2026-05-11
Toolathlon	Agentic	14	0.35	2026-05-06
ALE-Bench	Coding	27	1004.52	2026-05-06
IOI	Coding	22	15.25%	2026-05-26
LiveCodeBench	Coding	25	84.009%	2026-05-28
MLX Benchmark V2	Coding	5	75.19	2026-05-06
SciCode	Coding	31	46.9%	2026-05-11
SciCode	Coding	131	38.4%	2026-05-11
SciCode	Coding	191	35.2%	2026-05-11
SWE-bench Verified	Coding	33	69.8%	2026-05-28
Terminal-Bench 2.0	Coding	33	39.888%	2026-05-28
Vibe Code Bench v1.1	Coding	17	26.097%	2026-05-28
OmniDocBench 1.5	Document Understanding	9	0.76	2026-05-06
SAGE	Education	34	38.081%	2026-05-28
Vectara HHEM Hallucination Leaderboard	Factuality	2	96.90	2026-05-06
CorpFin v2	Finance	37	61.189%	2026-05-28
Finance Agent v1.1	Finance	30	47.801%	2026-05-04
Finance Agent v2	Finance	14	38.217%	2026-05-28
MortgageTax	Finance	46	59.102%	2026-05-28
TaxEval v2	Finance	79	67.416%	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	10	1282.53 Elo / 18 games	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	32	974.38 Elo / 126 games	2026-05-28
InfiniteBM Liar's Dice	Game	11	1304.64 Elo / 40 games	2026-05-28
InfiniteBM Liar's Dice	Game	37	795.51 Elo / 130 games	2026-05-28
InfiniteBM Werewolf	Game	9	902.42 Elo / 6 games	2026-05-28
MedCode	Healthcare	25	41.029%	2026-05-28
MedScribe	Healthcare	31	77.09%	2026-05-28
Artificial Analysis Intelligence Index	Intelligence	47	43.98	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	90	38.11	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	201	24.36	2026-05-11
GPQA Diamond	Intelligence	49	77.526%	2026-05-28
Humanity's Last Exam	Intelligence	40	26.5%	2026-05-11
Humanity's Last Exam	Intelligence	98	14.7%	2026-05-11
Humanity's Last Exam	Intelligence	391	4.2%	2026-05-11
LiveBench	Intelligence	26	71.31	2026-05-05
LiveBench	Intelligence	47	63.64	2026-05-05
MMLU Pro	Intelligence	79	77.172%	2026-05-28
MMMU Pro	Intelligence	39	73.584%	2026-05-28
Vals Index	Intelligence	15	46.461%	2026-05-28
Vals Multimodal Index	Intelligence	11	47.484%	2026-05-28
CaseLaw v2	Legal	46	51.875%	2026-05-04
LegalBench	Legal	72	77.92%	2026-05-28
MRCR v2 (8-needle)	Long Context	5	0.33	2026-05-06
AIME	Math	29	88.75%	2026-04-16
ProofBench	Math	30	5%	2026-05-28
CAIS Text Capabilities Index	Reasoning	27	17.9	2026-05-27
Context Arena	Reasoning	24	48.78	2026-05-06
Context Arena	Reasoning	37	36.17	2026-05-06
Context Arena	Reasoning	46	29.90	2026-05-06
Context Arena	Reasoning	62	21.87	2026-05-06
Context Arena	Reasoning	70	12.31	2026-05-06
GPQA Diamond	Reasoning	88	81.7%	2026-05-11
GPQA Diamond	Reasoning	148	76.1%	2026-05-11
GPQA Diamond	Reasoning	309	55.8%	2026-05-11
Graphwalks BFS <128k	Reasoning	5	0.73	2026-05-06
Graphwalks parents <128k	Reasoning	9	0.51	2026-05-06
CAIS Risk Index	Safety	16	48.7	2026-05-27
CritPt	Science	17	9.3%	2026-05-11
CritPt	Science	34	5.1%	2026-05-11
CritPt	Science	228	0%	2026-05-11
CAIS Vision Capabilities Index	Vision	25	44.7	2026-05-27

Metadata

Benchmark Results