DeepSeek V4 Pro

DeepSeek / DeepSeek

123scores
92benchmarks
$0.435 / $0.87 per 1M tokenscost in/out

Metadata

DeepSeek Open source

Aliases: deepseek-deepseek-v4-pro, deepseek-deepseek-v4-pro-20260423, deepseek-v4-pro, deepseek-v4-pro-20260423, deepseek/deepseek-v4-pro, deepseek/deepseek-v4-pro-20260423, DS-V4-Pro Max, DeepSeek V4 Pro Max, DeepSeek-V4-Pro-Max, deepseek-v4-pro-max

Benchmark Results

Benchmark Category Rank Score Sampled
CoWorkBench Agentic 3 66.3% 2026-05-28
GDPval-AA Agentic 3 1554 2026-05-06
Gert Labs Rankings Agentic 15 0.55 2026-05-11
ITBench-AA Agentic 7 38.3% 2026-05-28
MCP Atlas Agentic 4 73.6% 2026-05-28
MCPMark Agentic 3 57.1% 2026-05-28
QwenClawBench Agentic 3 59.2% 2026-05-28
QwenWorldBench Agentic 3 52.3% 2026-05-28
Tau2-Bench Telecom Agentic 11 96.2% 2026-05-11
Tau2-Bench Telecom Agentic 26 94.2% 2026-05-11
Tau2-Bench Telecom Agentic 51 91.2% 2026-05-11
Terminal-Bench Hard Agentic 19 46.2% 2026-05-11
Terminal-Bench Hard Agentic 32 41.7% 2026-05-11
Terminal-Bench Hard Agentic 52 36.4% 2026-05-11
TERMS-Bench Agentic 6 61.8% SE+ 2026-05-28
Toolathlon Agentic 3 0.52 2026-05-06
Vending-Bench 2 Agentic 21 3284.52 2026-05-28
VitaBench Agentic 1 51.9% 2026-05-28
YC-Bench Agentic 3 1066426 2026-05-06
OpenUGI Alignment 12 62.26 2026-05-06
OpenUGI Alignment 136 48.55 2026-05-06
ALE-Bench Coding 26 1006.08 2026-05-06
ALE-Bench Coding 67 521.67 2026-05-06
Arena AI Code Coding 15 1455 2026-05-06
BLXBench Coding 21 15.20 2026-05-06
Claw-Eval Coding 5 58.4% 2026-05-28
Codeforces Coding 1 1 2026-05-28
DeepSWE Coding 12 7.52 2026-05-26
IOI Coding 8 35.833% 2026-05-26
Kernel Bench L3 Coding 5 1.07/54% 2026-05-28
LiveCodeBench Coding 1 93.5% 2026-05-28
LiveCodeBench Coding 5 87.484% 2026-05-28
LMArena WebDev Arena Coding 16 1454.67 2026-05-06
NL2Repo Coding 5 35.5% 2026-05-28
QwenSVG Coding 4 1506 2026-05-28
QwenWebDev Coding 2 1570 2026-05-28
SciCode Coding 19 50% 2026-05-11
SciCode Coding 35 46.4% 2026-05-11
SciCode Coding 65 42.4% 2026-05-11
SkillsBench Coding 4 52.3% 2026-05-28
SWE-bench Verified Coding 10 77.4% 2026-05-28
Terminal-Bench 2.0 Coding 14 56.18% 2026-05-28
Terminal-Bench 2.0 Coding 2 67.9% 2026-05-28
Terminal-Bench 2.1 Coding 11 50.187% 2026-05-28
Vibe Code Bench v1.1 Coding 10 49.931% 2026-05-28
AA-Omniscience Factuality 15 -10.02 2026-05-11
CorpFin v2 Finance 33 61.383% 2026-05-28
Finance Agent v1.1 Finance 4 60.389% 2026-05-04
Finance Agent v2 Finance 10 44.083% 2026-05-28
TaxEval v2 Finance 45 72.077% 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 11 1259.82 Elo / 13 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 26 1035.68 Elo / 114 games 2026-05-28
InfiniteBM Liar's Dice Game 19 1193.32 Elo / 27 games 2026-05-28
InfiniteBM Liar's Dice Game 20 1192.38 Elo / 1714 games 2026-05-28
BenchLM General Knowledge 9 88 2026-05-06
BenchLM General Knowledge 13 84 2026-05-06
BenchLM General Knowledge 32 70 2026-05-06
CSimpleQA General Knowledge 1 0.84 2026-05-06
MAXIFE General Knowledge 2 88.9% 2026-05-28
MMLU-ProX General Knowledge 4 83.9% 2026-05-28
MMLU-Redux General Knowledge 4 94.8% 2026-05-28
NOVA-63 General Knowledge 6 52.8% 2026-05-28
MedCode Healthcare 28 40.455% 2026-05-28
MedScribe Healthcare 38 75.144% 2026-05-28
PhysicianBench Healthcare 6 18.7 +/- 2.9 2026-05-27
IFBench Instruction Following 2 77% 2026-05-28
IFEval Instruction Following 6 91.9% 2026-05-28
AIIQ Composite IQ Intelligence 15 117 2026-05-12
Artificial Analysis Intelligence Index Intelligence 16 51.51 2026-05-11
Artificial Analysis Intelligence Index Intelligence 21 49.79 2026-05-11
Artificial Analysis Intelligence Index Intelligence 77 39.27 2026-05-11
GPQA Diamond Intelligence 13 89.394% 2026-05-28
HLE w/ tools Intelligence 6 48.2% 2026-05-28
Humanity's Last Exam Intelligence 3 37.7% 2026-05-28
Humanity's Last Exam Intelligence 11 35.9% 2026-05-11
Humanity's Last Exam Intelligence 17 33.5% 2026-05-11
Humanity's Last Exam Intelligence 194 7.7% 2026-05-11
LiveBench Intelligence 13 74.39 2026-05-05
MMLU Pro Intelligence 18 87.249% 2026-05-28
MMLU-Pro Intelligence 4 87.5% 2026-05-28
SuperGPQA Intelligence 5 69.9% 2026-05-28
Vals Index Intelligence 7 56.231% 2026-05-28
CaseLaw v2 Legal 27 59.378% 2026-05-04
LegalBench Legal 56 80.323% 2026-05-28
CorpusQA 1M Long Context 1 0.62 2026-05-06
MRCR 1M Long Context 1 0.83 2026-05-06
MRCR-v2 128k Long Context 4 74.4% 2026-05-28
needle-1M-bench Long Context 1 100 2026-05-06
needle-1M-bench Long Context 2 100 2026-05-06
needle-1M-bench Long Context 6 100 2026-05-06
needle-1M-bench Long Context 7 94 2026-05-06
ProofBench Math 24 10% 2026-05-28
GSM8K Mathematics 4 92.60 2026-05-06
HMMT February 2026 Mathematics 3 95.2% 2026-05-28
IMO-AnswerBench Mathematics 2 89.8% 2026-05-28
IMO-AnswerBench Mathematics 1 0.90 2026-05-06
MathArena Apex Mathematics 2 38.3% 2026-05-28
MathArena Apex Mathematics 1 0.90 2026-05-06
INCLUDE Multilingual 3 86.1% 2026-05-28
MMMLU Multilingual 4 87.9% 2026-05-28
Design Arena Multimodal 10 1313 2026-05-06
Artificial Analysis Openness Index Openness 47 50 2026-05-11
Artificial Analysis Openness Index Openness 48 50 2026-05-11
CAIS Text Capabilities Index Reasoning 13 32.1 2026-05-27
Context Arena Reasoning 18 55.99 2026-05-06
Context Arena Reasoning 55 26.31 2026-05-06
Global PIQA Reasoning 3 90.5% 2026-05-28
GPQA Diamond Reasoning 5 90.1% 2026-05-28
GPQA Diamond Reasoning 12 90.5% 2026-05-11
GPQA Diamond Reasoning 20 88.8% 2026-05-11
GPQA Diamond Reasoning 189 71.7% 2026-05-11
CAIS Risk Index Safety 21 54.1 2026-05-27
CritPt Science 1 12.9% 2026-05-28
CritPt Science 10 12.9% 2026-05-11
CritPt Science 15 10% 2026-05-11
CritPt Science 94 0.9% 2026-05-11
SWE-bench Multilingual Software Engineering 4 76.2% 2026-05-28
SWE-bench Pro Software Engineering 3 59% 2026-05-28
SWE-bench Verified Software Engineering 2 80.6% 2026-05-28
SpreadsheetBench Spreadsheets 4 84.9% 2026-05-28
Structured Output Benchmark Structured Output 13 85.30 2026-05-06
BFCL-V4 Tool Use 5 70.6% 2026-05-28
WMT24++ Translation 4 82.2% 2026-05-28