GPT-5.4 Mini
GPT / OpenAI
92scores
64benchmarks
$0.75 / $4.5 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5.4-mini, gpt-5.4-mini-20260317, openai-gpt-5.4-mini, openai-gpt-5.4-mini-20260317, openai/gpt-5.4-mini, openai/gpt-5.4-mini-20260317
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents-AA | Agentic | 5 | 28.2% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 49 | 63.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 54 | 58 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 76 | 40.83 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 124 | 13 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 41 | 18.90 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 47 | 13.19 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 72 | 4.44 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 116 | 1.11 | 2026-05-05 |
| AutoBench | Agentic | 15 | 2.91 | 2026-05-06 |
| Hindsight LLM Memory Leaderboard | Agentic | 5 | 86.40 | 2026-05-06 |
| ITBench-AA | Agentic | 11 | 35.2% | 2026-05-28 |
| MCP Atlas | Agentic | 13 | 56.70 | 2026-05-06 |
| OSWorld-Verified | Agentic | 6 | 0.72 | 2026-05-06 |
| PinchBench | Agentic | 48 | 0.76 | 2026-05-06 |
| RuneBench | Agentic | 8 | 4.10 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 99 | 83.3% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 212 | 36.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 290 | 23.4% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 9 | 52.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 71 | 34.1% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 154 | 18.2% | 2026-05-11 |
| Toolathlon | Agentic | 10 | 0.43 | 2026-05-06 |
| ALE-Bench | Coding | 16 | 1188.58 | 2026-05-06 |
| Arena AI Code | Coding | 31 | 1401 | 2026-05-06 |
| DeepSWE | Coding | 7 | 24.34 | 2026-05-26 |
| IOI | Coding | 35 | 6.417% | 2026-05-26 |
| LiveCodeBench | Coding | 38 | 81.465% | 2026-05-28 |
| SciCode | Coding | 22 | 49.9% | 2026-05-11 |
| SciCode | Coding | 45 | 44.2% | 2026-05-11 |
| SciCode | Coding | 105 | 39.6% | 2026-05-11 |
| SWE-bench Verified | Coding | 22 | 73% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 26 | 44.944% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 12 | 47.969% | 2026-05-28 |
| DAXBench | Data | 4 | 96.2% | 2026-05-28 |
| OmniDocBench 1.5 | Document Understanding | 8 | 0.87 | 2026-05-06 |
| SAGE | Education | 8 | 50.813% | 2026-05-28 |
| AA-Omniscience | Factuality | 17 | -18.68 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 20 | 94.50 | 2026-05-06 |
| CorpFin v2 | Finance | 42 | 60.917% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 20 | 53.405% | 2026-05-04 |
| Finance Agent v2 | Finance | 7 | 45.36% | 2026-05-28 |
| MortgageTax | Finance | 32 | 63.514% | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 10 | 22% rubric / 7% final | 2026-05-28 |
| TaxEval v2 | Finance | 55 | 71.218% | 2026-05-28 |
| InfiniteBM Chess | Game | 5 | 765.37 Elo / 8 games | 2026-05-28 |
| InfiniteBM Coup | Game | 6 | 1428.2 Elo / 14 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 30 | 996.02 Elo / 13 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 34 | 864.17 Elo / 115 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 8 | 1328.16 Elo / 40 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 32 | 1034.14 Elo / 118 games | 2026-05-28 |
| InfiniteBM Settlers of Catan | Game | 5 | 590.44 Elo / 11 games | 2026-05-28 |
| InfiniteBM Werewolf | Game | 2 | 1385.83 Elo / 10 games | 2026-05-28 |
| Artificial Analysis Intelligence Index | Intelligence | 29 | 48.9 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 92 | 37.73 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 212 | 23.28 | 2026-05-11 |
| GPQA Diamond | Intelligence | 32 | 83.08% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 37 | 26.6% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 86 | 17.1% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 254 | 5.7% | 2026-05-11 |
| LiveBench | Intelligence | 39 | 67.74 | 2026-05-05 |
| LiveBench | Intelligence | 46 | 63.65 | 2026-05-05 |
| MMLU Pro | Intelligence | 37 | 84.554% | 2026-05-28 |
| MMMU Pro | Intelligence | 30 | 79.249% | 2026-05-28 |
| Vals Index | Intelligence | 11 | 51.422% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 8 | 53.298% | 2026-05-28 |
| CaseLaw v2 | Legal | 47 | 51.661% | 2026-05-04 |
| MRCR v2 (8-needle) | Long Context | 4 | 0.34 | 2026-05-06 |
| AIME | Math | 11 | 95.625% | 2026-04-16 |
| ProofBench | Math | 13 | 21% | 2026-05-28 |
| Medical Chronology LLM Benchmark | Medical | 5 | 0.91 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 25 | 1248.44 | 2026-05-06 |
| Altered Riddles | Reasoning | 3 | 0.3058 | 2026-05-27 |
| Altered Riddles | Reasoning | 11 | 0.4010 | 2026-05-27 |
| CAIS Text Capabilities Index | Reasoning | 20 | 24.2 | 2026-05-27 |
| Context Arena | Reasoning | 30 | 45.67 | 2026-05-06 |
| Context Arena | Reasoning | 31 | 44.79 | 2026-05-06 |
| Context Arena | Reasoning | 32 | 42.08 | 2026-05-06 |
| Context Arena | Reasoning | 40 | 34.47 | 2026-05-06 |
| Context Arena | Reasoning | 63 | 20.83 | 2026-05-06 |
| GPQA Diamond | Reasoning | 30 | 87.5% | 2026-05-11 |
| GPQA Diamond | Reasoning | 83 | 82.3% | 2026-05-11 |
| GPQA Diamond | Reasoning | 274 | 60.6% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 4 | 0.76 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 5 | 0.71 | 2026-05-06 |
| CAIS Risk Index | Safety | 11 | 44.9 | 2026-05-27 |
| HarmActionsEval | Safety | 6 | 0.71 | 2026-05-06 |
| CritPt | Science | 16 | 10% | 2026-05-11 |
| CritPt | Science | 48 | 2.9% | 2026-05-11 |
| CritPt | Science | 227 | 0% | 2026-05-11 |
| ProgramBench | Software Engineering | 8 | 0% | 2026-05-05 |
| CAIS Vision Capabilities Index | Vision | 14 | 51.2 | 2026-05-27 |
No matching rows.