GPT-5.4 Nano
GPT / OpenAI
81scores
54benchmarks
$0.2 / $1.25 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5.4-nano, gpt-5.4-nano-20260317, openai-gpt-5.4-nano, openai-gpt-5.4-nano-20260317, openai/gpt-5.4-nano, openai/gpt-5.4-nano-20260317
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents-AA | Agentic | 8 | 24.9% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 64 | 51.50 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 80 | 38.17 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 89 | 33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 112 | 18.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 64 | 5.69 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 78 | 3.61 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 99 | 1.94 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 105 | 1.53 | 2026-05-05 |
| AutoBench | Agentic | 23 | 2.78 | 2026-05-06 |
| Hindsight LLM Memory Leaderboard | Agentic | 14 | 83.90 | 2026-05-06 |
| ITBench-AA | Agentic | 20 | 24.4% | 2026-05-28 |
| OSWorld-Verified | Agentic | 12 | 0.39 | 2026-05-06 |
| PinchBench | Agentic | 42 | 0.79 | 2026-05-06 |
| RuneBench | Agentic | 13 | 2.30 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 116 | 76% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 173 | 52.6% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 217 | 34.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 31 | 42.4% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 76 | 33.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 125 | 24.2% | 2026-05-11 |
| Toolathlon | Agentic | 14 | 0.35 | 2026-05-06 |
| ALE-Bench | Coding | 27 | 1004.52 | 2026-05-06 |
| IOI | Coding | 22 | 15.25% | 2026-05-26 |
| LiveCodeBench | Coding | 25 | 84.009% | 2026-05-28 |
| MLX Benchmark V2 | Coding | 5 | 75.19 | 2026-05-06 |
| SciCode | Coding | 31 | 46.9% | 2026-05-11 |
| SciCode | Coding | 131 | 38.4% | 2026-05-11 |
| SciCode | Coding | 191 | 35.2% | 2026-05-11 |
| SWE-bench Verified | Coding | 33 | 69.8% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 33 | 39.888% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 17 | 26.097% | 2026-05-28 |
| OmniDocBench 1.5 | Document Understanding | 9 | 0.76 | 2026-05-06 |
| SAGE | Education | 34 | 38.081% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 2 | 96.90 | 2026-05-06 |
| CorpFin v2 | Finance | 37 | 61.189% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 30 | 47.801% | 2026-05-04 |
| Finance Agent v2 | Finance | 14 | 38.217% | 2026-05-28 |
| MortgageTax | Finance | 46 | 59.102% | 2026-05-28 |
| TaxEval v2 | Finance | 79 | 67.416% | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 10 | 1282.53 Elo / 18 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 32 | 974.38 Elo / 126 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 11 | 1304.64 Elo / 40 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 37 | 795.51 Elo / 130 games | 2026-05-28 |
| InfiniteBM Werewolf | Game | 9 | 902.42 Elo / 6 games | 2026-05-28 |
| MedCode | Healthcare | 25 | 41.029% | 2026-05-28 |
| MedScribe | Healthcare | 31 | 77.09% | 2026-05-28 |
| Artificial Analysis Intelligence Index | Intelligence | 47 | 43.98 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 90 | 38.11 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 201 | 24.36 | 2026-05-11 |
| GPQA Diamond | Intelligence | 49 | 77.526% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 40 | 26.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 98 | 14.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 391 | 4.2% | 2026-05-11 |
| LiveBench | Intelligence | 26 | 71.31 | 2026-05-05 |
| LiveBench | Intelligence | 47 | 63.64 | 2026-05-05 |
| MMLU Pro | Intelligence | 79 | 77.172% | 2026-05-28 |
| MMMU Pro | Intelligence | 39 | 73.584% | 2026-05-28 |
| Vals Index | Intelligence | 15 | 46.461% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 11 | 47.484% | 2026-05-28 |
| CaseLaw v2 | Legal | 46 | 51.875% | 2026-05-04 |
| LegalBench | Legal | 72 | 77.92% | 2026-05-28 |
| MRCR v2 (8-needle) | Long Context | 5 | 0.33 | 2026-05-06 |
| AIME | Math | 29 | 88.75% | 2026-04-16 |
| ProofBench | Math | 30 | 5% | 2026-05-28 |
| CAIS Text Capabilities Index | Reasoning | 27 | 17.9 | 2026-05-27 |
| Context Arena | Reasoning | 24 | 48.78 | 2026-05-06 |
| Context Arena | Reasoning | 37 | 36.17 | 2026-05-06 |
| Context Arena | Reasoning | 46 | 29.90 | 2026-05-06 |
| Context Arena | Reasoning | 62 | 21.87 | 2026-05-06 |
| Context Arena | Reasoning | 70 | 12.31 | 2026-05-06 |
| GPQA Diamond | Reasoning | 88 | 81.7% | 2026-05-11 |
| GPQA Diamond | Reasoning | 148 | 76.1% | 2026-05-11 |
| GPQA Diamond | Reasoning | 309 | 55.8% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 5 | 0.73 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 9 | 0.51 | 2026-05-06 |
| CAIS Risk Index | Safety | 16 | 48.7 | 2026-05-27 |
| CritPt | Science | 17 | 9.3% | 2026-05-11 |
| CritPt | Science | 34 | 5.1% | 2026-05-11 |
| CritPt | Science | 228 | 0% | 2026-05-11 |
| CAIS Vision Capabilities Index | Vision | 25 | 44.7 | 2026-05-27 |
No matching rows.