DeepSeek V4 Pro | BenchmarkList

Metadata

DeepSeek Open source

Aliases: deepseek-deepseek-v4-pro, deepseek-deepseek-v4-pro-20260423, deepseek-v4-pro, deepseek-v4-pro-20260423, deepseek/deepseek-v4-pro, deepseek/deepseek-v4-pro-20260423, DS-V4-Pro Max, DeepSeek V4 Pro Max, DeepSeek-V4-Pro-Max, deepseek-v4-pro-max

Benchmark	Category	Rank	Score	Sampled
CoWorkBench	Agentic	3	66.3%	2026-05-28
GDPval-AA	Agentic	3	1554	2026-05-06
Gert Labs Rankings	Agentic	15	0.55	2026-05-11
ITBench-AA	Agentic	7	38.3%	2026-05-28
MCP Atlas	Agentic	4	73.6%	2026-05-28
MCPMark	Agentic	3	57.1%	2026-05-28
QwenClawBench	Agentic	3	59.2%	2026-05-28
QwenWorldBench	Agentic	3	52.3%	2026-05-28
Tau2-Bench Telecom	Agentic	11	96.2%	2026-05-11
Tau2-Bench Telecom	Agentic	26	94.2%	2026-05-11
Tau2-Bench Telecom	Agentic	51	91.2%	2026-05-11
Terminal-Bench Hard	Agentic	19	46.2%	2026-05-11
Terminal-Bench Hard	Agentic	32	41.7%	2026-05-11
Terminal-Bench Hard	Agentic	52	36.4%	2026-05-11
TERMS-Bench	Agentic	6	61.8% SE+	2026-05-28
Toolathlon	Agentic	3	0.52	2026-05-06
Vending-Bench 2	Agentic	21	3284.52	2026-05-28
VitaBench	Agentic	1	51.9%	2026-05-28
YC-Bench	Agentic	3	1066426	2026-05-06
OpenUGI	Alignment	12	62.26	2026-05-06
OpenUGI	Alignment	136	48.55	2026-05-06
ALE-Bench	Coding	26	1006.08	2026-05-06
ALE-Bench	Coding	67	521.67	2026-05-06
Arena AI Code	Coding	15	1455	2026-05-06
BLXBench	Coding	21	15.20	2026-05-06
Claw-Eval	Coding	5	58.4%	2026-05-28
Codeforces	Coding	1	1	2026-05-28
DeepSWE	Coding	12	7.52	2026-05-26
IOI	Coding	8	35.833%	2026-05-26
Kernel Bench L3	Coding	5	1.07/54%	2026-05-28
LiveCodeBench	Coding	1	93.5%	2026-05-28
LiveCodeBench	Coding	5	87.484%	2026-05-28
LMArena WebDev Arena	Coding	16	1454.67	2026-05-06
NL2Repo	Coding	5	35.5%	2026-05-28
QwenSVG	Coding	4	1506	2026-05-28
QwenWebDev	Coding	2	1570	2026-05-28
SciCode	Coding	19	50%	2026-05-11
SciCode	Coding	35	46.4%	2026-05-11
SciCode	Coding	65	42.4%	2026-05-11
SkillsBench	Coding	4	52.3%	2026-05-28
SWE-bench Verified	Coding	10	77.4%	2026-05-28
Terminal-Bench 2.0	Coding	14	56.18%	2026-05-28
Terminal-Bench 2.0	Coding	2	67.9%	2026-05-28
Terminal-Bench 2.1	Coding	11	50.187%	2026-05-28
Vibe Code Bench v1.1	Coding	10	49.931%	2026-05-28
AA-Omniscience	Factuality	15	-10.02	2026-05-11
CorpFin v2	Finance	33	61.383%	2026-05-28
Finance Agent v1.1	Finance	4	60.389%	2026-05-04
Finance Agent v2	Finance	10	44.083%	2026-05-28
TaxEval v2	Finance	45	72.077%	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	11	1259.82 Elo / 13 games	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	26	1035.68 Elo / 114 games	2026-05-28
InfiniteBM Liar's Dice	Game	19	1193.32 Elo / 27 games	2026-05-28
InfiniteBM Liar's Dice	Game	20	1192.38 Elo / 1714 games	2026-05-28
BenchLM	General Knowledge	9	88	2026-05-06
BenchLM	General Knowledge	13	84	2026-05-06
BenchLM	General Knowledge	32	70	2026-05-06
CSimpleQA	General Knowledge	1	0.84	2026-05-06
MAXIFE	General Knowledge	2	88.9%	2026-05-28
MMLU-ProX	General Knowledge	4	83.9%	2026-05-28
MMLU-Redux	General Knowledge	4	94.8%	2026-05-28
NOVA-63	General Knowledge	6	52.8%	2026-05-28
MedCode	Healthcare	28	40.455%	2026-05-28
MedScribe	Healthcare	38	75.144%	2026-05-28
PhysicianBench	Healthcare	6	18.7 +/- 2.9	2026-05-27
IFBench	Instruction Following	2	77%	2026-05-28
IFEval	Instruction Following	6	91.9%	2026-05-28
AIIQ Composite IQ	Intelligence	15	117	2026-05-12
Artificial Analysis Intelligence Index	Intelligence	16	51.51	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	21	49.79	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	77	39.27	2026-05-11
GPQA Diamond	Intelligence	13	89.394%	2026-05-28
HLE w/ tools	Intelligence	6	48.2%	2026-05-28
Humanity's Last Exam	Intelligence	3	37.7%	2026-05-28
Humanity's Last Exam	Intelligence	11	35.9%	2026-05-11
Humanity's Last Exam	Intelligence	17	33.5%	2026-05-11
Humanity's Last Exam	Intelligence	194	7.7%	2026-05-11
LiveBench	Intelligence	13	74.39	2026-05-05
MMLU Pro	Intelligence	18	87.249%	2026-05-28
MMLU-Pro	Intelligence	4	87.5%	2026-05-28
SuperGPQA	Intelligence	5	69.9%	2026-05-28
Vals Index	Intelligence	7	56.231%	2026-05-28
CaseLaw v2	Legal	27	59.378%	2026-05-04
LegalBench	Legal	56	80.323%	2026-05-28
CorpusQA 1M	Long Context	1	0.62	2026-05-06
MRCR 1M	Long Context	1	0.83	2026-05-06
MRCR-v2 128k	Long Context	4	74.4%	2026-05-28
needle-1M-bench	Long Context	1	100	2026-05-06
needle-1M-bench	Long Context	2	100	2026-05-06
needle-1M-bench	Long Context	6	100	2026-05-06
needle-1M-bench	Long Context	7	94	2026-05-06
ProofBench	Math	24	10%	2026-05-28
GSM8K	Mathematics	4	92.60	2026-05-06
HMMT February 2026	Mathematics	3	95.2%	2026-05-28
IMO-AnswerBench	Mathematics	2	89.8%	2026-05-28
IMO-AnswerBench	Mathematics	1	0.90	2026-05-06
MathArena Apex	Mathematics	2	38.3%	2026-05-28
MathArena Apex	Mathematics	1	0.90	2026-05-06
INCLUDE	Multilingual	3	86.1%	2026-05-28
MMMLU	Multilingual	4	87.9%	2026-05-28
Design Arena	Multimodal	10	1313	2026-05-06
Artificial Analysis Openness Index	Openness	47	50	2026-05-11
Artificial Analysis Openness Index	Openness	48	50	2026-05-11
CAIS Text Capabilities Index	Reasoning	13	32.1	2026-05-27
Context Arena	Reasoning	18	55.99	2026-05-06
Context Arena	Reasoning	55	26.31	2026-05-06
Global PIQA	Reasoning	3	90.5%	2026-05-28
GPQA Diamond	Reasoning	5	90.1%	2026-05-28
GPQA Diamond	Reasoning	12	90.5%	2026-05-11
GPQA Diamond	Reasoning	20	88.8%	2026-05-11
GPQA Diamond	Reasoning	189	71.7%	2026-05-11
CAIS Risk Index	Safety	21	54.1	2026-05-27
CritPt	Science	1	12.9%	2026-05-28
CritPt	Science	10	12.9%	2026-05-11
CritPt	Science	15	10%	2026-05-11
CritPt	Science	94	0.9%	2026-05-11
SWE-bench Multilingual	Software Engineering	4	76.2%	2026-05-28
SWE-bench Pro	Software Engineering	3	59%	2026-05-28
SWE-bench Verified	Software Engineering	2	80.6%	2026-05-28
SpreadsheetBench	Spreadsheets	4	84.9%	2026-05-28
Structured Output Benchmark	Structured Output	13	85.30	2026-05-06
BFCL-V4	Tool Use	5	70.6%	2026-05-28
WMT24++	Translation	4	82.2%	2026-05-28

Metadata

Benchmark Results