GPT-4.5 | BenchmarkList

Metadata

GPT Closed/API

Benchmark	Category	Rank	Score	Sampled
ARC-AGI-1	Agentic	128	10.30	2026-05-05
ARC-AGI-2	Agentic	124	0.80	2026-05-05
TextClass Benchmark	Classification	7	1767.86	2026-05-06
AIRTBench	Cybersecurity	2	36.89	2026-05-06
Spider	Data	5	85.30	2026-05-06
Open FinLLM Leaderboard	Finance	3	43.403043%	2026-05-27
Arena-Hard	Generalization	12	50.0%	2026-05-27
HELM AIR-Bench	Generalization	30	0.741482	2026-05-28
HELM Safety	Generalization	10	0.964672	2026-05-28
MMLU Medical Genetics	Healthcare	1	92.0%	2026-05-27
MMLU Professional Medicine	Healthcare	1	93.75%	2026-05-27
MultiMedQA	Healthcare	1	82.405833%	2026-05-27
Multi-IF	Instruction Following	15	0.71	2026-05-06
Artificial Analysis Intelligence Index	Intelligence	243	19.96	2026-05-11
MathVision	Intelligence	65	47.30	2026-05-06
SimpleQA	Intelligence	1	62.5%	2026-05-27
HindiGen v1	Language	30	15.46	2026-05-06
OpenAI-MRCR: 2 needle 128k	Long Context	6	0.39	2026-05-06
CharXiv-D	Multimodal	3	0.90	2026-05-06
CharXiv-R	Multimodal	28	0.55	2026-05-06
MMSI-Bench	Multimodal	6	40.3%	2026-05-28
Video SimpleQA	Multimodal	4	54.10	2026-05-06
Visual-Language Understanding	Multimodal	34	42.11	2026-05-06
EnigmaEval	Reasoning	23	3.18	2026-05-06
Graphwalks BFS <128k	Reasoning	6	0.72	2026-05-06
Graphwalks parents <128k	Reasoning	4	0.73	2026-05-06
Humanity's Last Exam (Text Only)	Reasoning	39	5.80	2026-05-06
LingOly-TOO	Reasoning	10	0.25	2026-05-06
ComplexFuncBench	Tool Use	3	0.63	2026-05-06
COLLIE	Writing	4	0.72	2026-05-06