GPT-4.1 Nano | BenchmarkList

Metadata

GPT Closed/API

Aliases: gpt-4.1-nano, gpt-4.1-nano-2025-04-14, openai-gpt-4.1-nano, openai-gpt-4.1-nano-2025-04-14, openai/gpt-4.1-nano, openai/gpt-4.1-nano-2025-04-14

Benchmark	Category	Rank	Score	Sampled
ARC-AGI-1	Agentic	143	0	2026-05-05
ARC-AGI-2	Agentic	135	0	2026-05-05
Berkeley Function-Calling Leaderboard	Agentic	58	33.05%	2026-05-27
Berkeley Function-Calling Leaderboard	Agentic	90	24.88%	2026-05-27
Galileo Agent Leaderboard	Agentic	14	0.38	2026-05-06
Hindsight LLM Memory Leaderboard	Agentic	2	87.20	2026-05-06
MCPMark	Agentic	39	0	2026-05-06
RealDataAgentBench	Agentic	12	0.62	2026-04-28
Tau2-Bench Telecom	Agentic	338	17.3%	2026-05-11
Terminal-Bench Hard	Agentic	286	3.8%	2026-05-11
TextClass Benchmark	Classification	61	1533.06	2026-05-06
BigCodeBench-Hard	Coding	22	28.40	2026-05-05
LiveCodeBench	Coding	96	42.718%	2026-05-28
SciCode	Coding	313	25.9%	2026-05-11
GSMA Open Telco Leaderboard	Domain	50	48.28	2026-05-06
CorpFin v2	Finance	97	42.075%	2026-05-28
MortgageTax	Finance	60	52.822%	2026-05-28
TaxEval v2	Finance	98	60.752%	2026-05-28
BenchLM	General Knowledge	95	27	2026-05-06
Arena-Hard	Generalization	25	13.7%	2026-05-27
HELM AIR-Bench	Generalization	55	0.615297	2026-05-28
HELM Safety	Generalization	20	0.937650	2026-05-28
MedQA	Healthcare	83	68.225%	2026-04-16
Multi-IF	Instruction Following	20	0.57	2026-05-06
Artificial Analysis Intelligence Index	Intelligence	366	13.04	2026-05-11
GPQA Diamond	Intelligence	93	50.758%	2026-05-28
Humanity's Last Exam	Intelligence	425	3.9%	2026-05-11
MMLU Pro	Intelligence	102	63.479%	2026-05-28
MMLU-Pro	Intelligence	249	65.7%	2026-05-11
MMMU Pro	Intelligence	69	55.055%	2026-05-28
SimpleQA	Intelligence	23	7.6%	2026-05-27
HindiGen v1	Language	20	56.89	2026-05-06
LegalBench	Legal	103	61.056%	2026-05-28
LEXam	Legal	20	43.68% open / 39.22% MCQ	2026-05-28
Graphwalks BFS >128k	Long Context	7	0.03	2026-05-06
Graphwalks parents >128k	Long Context	6	0.06	2026-05-06
OpenAI-MRCR: 2 needle 128k	Long Context	7	0.37	2026-05-06
OpenAI-MRCR: 2 needle 1M	Long Context	5	0.12	2026-05-06
AIME	Math	76	26.458%	2026-04-16
AIME 2025	Math	201	24%	2026-05-11
MATH 500	Math	39	80.2%	2026-01-09
MGSM	Math	74	69.273%	2026-01-09
LanguageBench	Multilingual	15	0.52	2026-05-06
CharXiv-D	Multimodal	13	0.74	2026-05-06
CharXiv-R	Multimodal	33	0.41	2026-05-06
Design Arena	Multimodal	111	1021	2026-05-06
Math-VR	Multimodal	26	9.1	2026-05-27
Visual-Language Understanding	Multimodal	57	26.55	2026-05-06
GPQA Diamond	Reasoning	339	51.2%	2026-05-11
Graphwalks BFS <128k	Reasoning	11	0.25	2026-05-06
Graphwalks parents <128k	Reasoning	11	0.09	2026-05-06
CritPt	Science	215	0%	2026-05-11
ComplexFuncBench	Tool Use	6	0.06	2026-05-06
COLLIE	Writing	9	0.42	2026-05-06

Metadata

Benchmark Results