GPT-5.4 Mini

GPT / OpenAI

92scores
64benchmarks
$0.75 / $4.5 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-5.4-mini, gpt-5.4-mini-20260317, openai-gpt-5.4-mini, openai-gpt-5.4-mini-20260317, openai/gpt-5.4-mini, openai/gpt-5.4-mini-20260317

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents-AA Agentic 5 28.2% 2026-05-11
ARC-AGI-1 Agentic 49 63.67 2026-05-05
ARC-AGI-1 Agentic 54 58 2026-05-05
ARC-AGI-1 Agentic 76 40.83 2026-05-05
ARC-AGI-1 Agentic 124 13 2026-05-05
ARC-AGI-2 Agentic 41 18.90 2026-05-05
ARC-AGI-2 Agentic 47 13.19 2026-05-05
ARC-AGI-2 Agentic 72 4.44 2026-05-05
ARC-AGI-2 Agentic 116 1.11 2026-05-05
AutoBench Agentic 15 2.91 2026-05-06
Hindsight LLM Memory Leaderboard Agentic 5 86.40 2026-05-06
ITBench-AA Agentic 11 35.2% 2026-05-28
MCP Atlas Agentic 13 56.70 2026-05-06
OSWorld-Verified Agentic 6 0.72 2026-05-06
PinchBench Agentic 48 0.76 2026-05-06
RuneBench Agentic 8 4.10 2026-05-05
Tau2-Bench Telecom Agentic 99 83.3% 2026-05-11
Tau2-Bench Telecom Agentic 212 36.5% 2026-05-11
Tau2-Bench Telecom Agentic 290 23.4% 2026-05-11
Terminal-Bench Hard Agentic 9 52.3% 2026-05-11
Terminal-Bench Hard Agentic 71 34.1% 2026-05-11
Terminal-Bench Hard Agentic 154 18.2% 2026-05-11
Toolathlon Agentic 10 0.43 2026-05-06
ALE-Bench Coding 16 1188.58 2026-05-06
Arena AI Code Coding 31 1401 2026-05-06
DeepSWE Coding 7 24.34 2026-05-26
IOI Coding 35 6.417% 2026-05-26
LiveCodeBench Coding 38 81.465% 2026-05-28
SciCode Coding 22 49.9% 2026-05-11
SciCode Coding 45 44.2% 2026-05-11
SciCode Coding 105 39.6% 2026-05-11
SWE-bench Verified Coding 22 73% 2026-05-28
Terminal-Bench 2.0 Coding 26 44.944% 2026-05-28
Vibe Code Bench v1.1 Coding 12 47.969% 2026-05-28
DAXBench Data 4 96.2% 2026-05-28
OmniDocBench 1.5 Document Understanding 8 0.87 2026-05-06
SAGE Education 8 50.813% 2026-05-28
AA-Omniscience Factuality 17 -18.68 2026-05-11
Vectara HHEM Hallucination Leaderboard Factuality 20 94.50 2026-05-06
CorpFin v2 Finance 42 60.917% 2026-05-28
Finance Agent v1.1 Finance 20 53.405% 2026-05-04
Finance Agent v2 Finance 7 45.36% 2026-05-28
MortgageTax Finance 32 63.514% 2026-05-28
Rogo Big Finance Bench Finance 10 22% rubric / 7% final 2026-05-28
TaxEval v2 Finance 55 71.218% 2026-05-28
InfiniteBM Chess Game 5 765.37 Elo / 8 games 2026-05-28
InfiniteBM Coup Game 6 1428.2 Elo / 14 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 30 996.02 Elo / 13 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 34 864.17 Elo / 115 games 2026-05-28
InfiniteBM Liar's Dice Game 8 1328.16 Elo / 40 games 2026-05-28
InfiniteBM Liar's Dice Game 32 1034.14 Elo / 118 games 2026-05-28
InfiniteBM Settlers of Catan Game 5 590.44 Elo / 11 games 2026-05-28
InfiniteBM Werewolf Game 2 1385.83 Elo / 10 games 2026-05-28
Artificial Analysis Intelligence Index Intelligence 29 48.9 2026-05-11
Artificial Analysis Intelligence Index Intelligence 92 37.73 2026-05-11
Artificial Analysis Intelligence Index Intelligence 212 23.28 2026-05-11
GPQA Diamond Intelligence 32 83.08% 2026-05-28
Humanity's Last Exam Intelligence 37 26.6% 2026-05-11
Humanity's Last Exam Intelligence 86 17.1% 2026-05-11
Humanity's Last Exam Intelligence 254 5.7% 2026-05-11
LiveBench Intelligence 39 67.74 2026-05-05
LiveBench Intelligence 46 63.65 2026-05-05
MMLU Pro Intelligence 37 84.554% 2026-05-28
MMMU Pro Intelligence 30 79.249% 2026-05-28
Vals Index Intelligence 11 51.422% 2026-05-28
Vals Multimodal Index Intelligence 8 53.298% 2026-05-28
CaseLaw v2 Legal 47 51.661% 2026-05-04
MRCR v2 (8-needle) Long Context 4 0.34 2026-05-06
AIME Math 11 95.625% 2026-04-16
ProofBench Math 13 21% 2026-05-28
Medical Chronology LLM Benchmark Medical 5 0.91 2026-05-06
LMArena Vision Arena Multimodal 25 1248.44 2026-05-06
Altered Riddles Reasoning 3 0.3058 2026-05-27
Altered Riddles Reasoning 11 0.4010 2026-05-27
CAIS Text Capabilities Index Reasoning 20 24.2 2026-05-27
Context Arena Reasoning 30 45.67 2026-05-06
Context Arena Reasoning 31 44.79 2026-05-06
Context Arena Reasoning 32 42.08 2026-05-06
Context Arena Reasoning 40 34.47 2026-05-06
Context Arena Reasoning 63 20.83 2026-05-06
GPQA Diamond Reasoning 30 87.5% 2026-05-11
GPQA Diamond Reasoning 83 82.3% 2026-05-11
GPQA Diamond Reasoning 274 60.6% 2026-05-11
Graphwalks BFS <128k Reasoning 4 0.76 2026-05-06
Graphwalks parents <128k Reasoning 5 0.71 2026-05-06
CAIS Risk Index Safety 11 44.9 2026-05-27
HarmActionsEval Safety 6 0.71 2026-05-06
CritPt Science 16 10% 2026-05-11
CritPt Science 48 2.9% 2026-05-11
CritPt Science 227 0% 2026-05-11
ProgramBench Software Engineering 8 0% 2026-05-05
CAIS Vision Capabilities Index Vision 14 51.2 2026-05-27