GPT-4.1 Nano
GPT / OpenAI
54scores
53benchmarks
$0.1 / $0.4 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-4.1-nano, gpt-4.1-nano-2025-04-14, openai-gpt-4.1-nano, openai-gpt-4.1-nano-2025-04-14, openai/gpt-4.1-nano, openai/gpt-4.1-nano-2025-04-14
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 143 | 0 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 135 | 0 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 58 | 33.05% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 90 | 24.88% | 2026-05-27 |
| Galileo Agent Leaderboard | Agentic | 14 | 0.38 | 2026-05-06 |
| Hindsight LLM Memory Leaderboard | Agentic | 2 | 87.20 | 2026-05-06 |
| MCPMark | Agentic | 39 | 0 | 2026-05-06 |
| RealDataAgentBench | Agentic | 12 | 0.62 | 2026-04-28 |
| Tau2-Bench Telecom | Agentic | 338 | 17.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 286 | 3.8% | 2026-05-11 |
| TextClass Benchmark | Classification | 61 | 1533.06 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 22 | 28.40 | 2026-05-05 |
| LiveCodeBench | Coding | 96 | 42.718% | 2026-05-28 |
| SciCode | Coding | 313 | 25.9% | 2026-05-11 |
| GSMA Open Telco Leaderboard | Domain | 50 | 48.28 | 2026-05-06 |
| CorpFin v2 | Finance | 97 | 42.075% | 2026-05-28 |
| MortgageTax | Finance | 60 | 52.822% | 2026-05-28 |
| TaxEval v2 | Finance | 98 | 60.752% | 2026-05-28 |
| BenchLM | General Knowledge | 95 | 27 | 2026-05-06 |
| Arena-Hard | Generalization | 25 | 13.7% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 55 | 0.615297 | 2026-05-28 |
| HELM Safety | Generalization | 20 | 0.937650 | 2026-05-28 |
| MedQA | Healthcare | 83 | 68.225% | 2026-04-16 |
| Multi-IF | Instruction Following | 20 | 0.57 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 366 | 13.04 | 2026-05-11 |
| GPQA Diamond | Intelligence | 93 | 50.758% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 425 | 3.9% | 2026-05-11 |
| MMLU Pro | Intelligence | 102 | 63.479% | 2026-05-28 |
| MMLU-Pro | Intelligence | 249 | 65.7% | 2026-05-11 |
| MMMU Pro | Intelligence | 69 | 55.055% | 2026-05-28 |
| SimpleQA | Intelligence | 23 | 7.6% | 2026-05-27 |
| HindiGen v1 | Language | 20 | 56.89 | 2026-05-06 |
| LegalBench | Legal | 103 | 61.056% | 2026-05-28 |
| LEXam | Legal | 20 | 43.68% open / 39.22% MCQ | 2026-05-28 |
| Graphwalks BFS >128k | Long Context | 7 | 0.03 | 2026-05-06 |
| Graphwalks parents >128k | Long Context | 6 | 0.06 | 2026-05-06 |
| OpenAI-MRCR: 2 needle 128k | Long Context | 7 | 0.37 | 2026-05-06 |
| OpenAI-MRCR: 2 needle 1M | Long Context | 5 | 0.12 | 2026-05-06 |
| AIME | Math | 76 | 26.458% | 2026-04-16 |
| AIME 2025 | Math | 201 | 24% | 2026-05-11 |
| MATH 500 | Math | 39 | 80.2% | 2026-01-09 |
| MGSM | Math | 74 | 69.273% | 2026-01-09 |
| LanguageBench | Multilingual | 15 | 0.52 | 2026-05-06 |
| CharXiv-D | Multimodal | 13 | 0.74 | 2026-05-06 |
| CharXiv-R | Multimodal | 33 | 0.41 | 2026-05-06 |
| Design Arena | Multimodal | 111 | 1021 | 2026-05-06 |
| Math-VR | Multimodal | 26 | 9.1 | 2026-05-27 |
| Visual-Language Understanding | Multimodal | 57 | 26.55 | 2026-05-06 |
| GPQA Diamond | Reasoning | 339 | 51.2% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 11 | 0.25 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 11 | 0.09 | 2026-05-06 |
| CritPt | Science | 215 | 0% | 2026-05-11 |
| ComplexFuncBench | Tool Use | 6 | 0.06 | 2026-05-06 |
| COLLIE | Writing | 9 | 0.42 | 2026-05-06 |
No matching rows.