DeepSeek V4 Pro
DeepSeek / DeepSeek
123scores
92benchmarks
$0.435 / $0.87 per 1M tokenscost in/out
Metadata
DeepSeek Open source
Aliases: deepseek-deepseek-v4-pro, deepseek-deepseek-v4-pro-20260423, deepseek-v4-pro, deepseek-v4-pro-20260423, deepseek/deepseek-v4-pro, deepseek/deepseek-v4-pro-20260423, DS-V4-Pro Max, DeepSeek V4 Pro Max, DeepSeek-V4-Pro-Max, deepseek-v4-pro-max
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| CoWorkBench | Agentic | 3 | 66.3% | 2026-05-28 |
| GDPval-AA | Agentic | 3 | 1554 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 15 | 0.55 | 2026-05-11 |
| ITBench-AA | Agentic | 7 | 38.3% | 2026-05-28 |
| MCP Atlas | Agentic | 4 | 73.6% | 2026-05-28 |
| MCPMark | Agentic | 3 | 57.1% | 2026-05-28 |
| QwenClawBench | Agentic | 3 | 59.2% | 2026-05-28 |
| QwenWorldBench | Agentic | 3 | 52.3% | 2026-05-28 |
| Tau2-Bench Telecom | Agentic | 11 | 96.2% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 26 | 94.2% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 51 | 91.2% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 19 | 46.2% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 32 | 41.7% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 52 | 36.4% | 2026-05-11 |
| TERMS-Bench | Agentic | 6 | 61.8% SE+ | 2026-05-28 |
| Toolathlon | Agentic | 3 | 0.52 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 21 | 3284.52 | 2026-05-28 |
| VitaBench | Agentic | 1 | 51.9% | 2026-05-28 |
| YC-Bench | Agentic | 3 | 1066426 | 2026-05-06 |
| OpenUGI | Alignment | 12 | 62.26 | 2026-05-06 |
| OpenUGI | Alignment | 136 | 48.55 | 2026-05-06 |
| ALE-Bench | Coding | 26 | 1006.08 | 2026-05-06 |
| ALE-Bench | Coding | 67 | 521.67 | 2026-05-06 |
| Arena AI Code | Coding | 15 | 1455 | 2026-05-06 |
| BLXBench | Coding | 21 | 15.20 | 2026-05-06 |
| Claw-Eval | Coding | 5 | 58.4% | 2026-05-28 |
| Codeforces | Coding | 1 | 1 | 2026-05-28 |
| DeepSWE | Coding | 12 | 7.52 | 2026-05-26 |
| IOI | Coding | 8 | 35.833% | 2026-05-26 |
| Kernel Bench L3 | Coding | 5 | 1.07/54% | 2026-05-28 |
| LiveCodeBench | Coding | 1 | 93.5% | 2026-05-28 |
| LiveCodeBench | Coding | 5 | 87.484% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 16 | 1454.67 | 2026-05-06 |
| NL2Repo | Coding | 5 | 35.5% | 2026-05-28 |
| QwenSVG | Coding | 4 | 1506 | 2026-05-28 |
| QwenWebDev | Coding | 2 | 1570 | 2026-05-28 |
| SciCode | Coding | 19 | 50% | 2026-05-11 |
| SciCode | Coding | 35 | 46.4% | 2026-05-11 |
| SciCode | Coding | 65 | 42.4% | 2026-05-11 |
| SkillsBench | Coding | 4 | 52.3% | 2026-05-28 |
| SWE-bench Verified | Coding | 10 | 77.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 14 | 56.18% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 2 | 67.9% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 11 | 50.187% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 10 | 49.931% | 2026-05-28 |
| AA-Omniscience | Factuality | 15 | -10.02 | 2026-05-11 |
| CorpFin v2 | Finance | 33 | 61.383% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 4 | 60.389% | 2026-05-04 |
| Finance Agent v2 | Finance | 10 | 44.083% | 2026-05-28 |
| TaxEval v2 | Finance | 45 | 72.077% | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 11 | 1259.82 Elo / 13 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 26 | 1035.68 Elo / 114 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 19 | 1193.32 Elo / 27 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 20 | 1192.38 Elo / 1714 games | 2026-05-28 |
| BenchLM | General Knowledge | 9 | 88 | 2026-05-06 |
| BenchLM | General Knowledge | 13 | 84 | 2026-05-06 |
| BenchLM | General Knowledge | 32 | 70 | 2026-05-06 |
| CSimpleQA | General Knowledge | 1 | 0.84 | 2026-05-06 |
| MAXIFE | General Knowledge | 2 | 88.9% | 2026-05-28 |
| MMLU-ProX | General Knowledge | 4 | 83.9% | 2026-05-28 |
| MMLU-Redux | General Knowledge | 4 | 94.8% | 2026-05-28 |
| NOVA-63 | General Knowledge | 6 | 52.8% | 2026-05-28 |
| MedCode | Healthcare | 28 | 40.455% | 2026-05-28 |
| MedScribe | Healthcare | 38 | 75.144% | 2026-05-28 |
| PhysicianBench | Healthcare | 6 | 18.7 +/- 2.9 | 2026-05-27 |
| IFBench | Instruction Following | 2 | 77% | 2026-05-28 |
| IFEval | Instruction Following | 6 | 91.9% | 2026-05-28 |
| AIIQ Composite IQ | Intelligence | 15 | 117 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 16 | 51.51 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 21 | 49.79 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 77 | 39.27 | 2026-05-11 |
| GPQA Diamond | Intelligence | 13 | 89.394% | 2026-05-28 |
| HLE w/ tools | Intelligence | 6 | 48.2% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 3 | 37.7% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 11 | 35.9% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 17 | 33.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 194 | 7.7% | 2026-05-11 |
| LiveBench | Intelligence | 13 | 74.39 | 2026-05-05 |
| MMLU Pro | Intelligence | 18 | 87.249% | 2026-05-28 |
| MMLU-Pro | Intelligence | 4 | 87.5% | 2026-05-28 |
| SuperGPQA | Intelligence | 5 | 69.9% | 2026-05-28 |
| Vals Index | Intelligence | 7 | 56.231% | 2026-05-28 |
| CaseLaw v2 | Legal | 27 | 59.378% | 2026-05-04 |
| LegalBench | Legal | 56 | 80.323% | 2026-05-28 |
| CorpusQA 1M | Long Context | 1 | 0.62 | 2026-05-06 |
| MRCR 1M | Long Context | 1 | 0.83 | 2026-05-06 |
| MRCR-v2 128k | Long Context | 4 | 74.4% | 2026-05-28 |
| needle-1M-bench | Long Context | 1 | 100 | 2026-05-06 |
| needle-1M-bench | Long Context | 2 | 100 | 2026-05-06 |
| needle-1M-bench | Long Context | 6 | 100 | 2026-05-06 |
| needle-1M-bench | Long Context | 7 | 94 | 2026-05-06 |
| ProofBench | Math | 24 | 10% | 2026-05-28 |
| GSM8K | Mathematics | 4 | 92.60 | 2026-05-06 |
| HMMT February 2026 | Mathematics | 3 | 95.2% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 2 | 89.8% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 1 | 0.90 | 2026-05-06 |
| MathArena Apex | Mathematics | 2 | 38.3% | 2026-05-28 |
| MathArena Apex | Mathematics | 1 | 0.90 | 2026-05-06 |
| INCLUDE | Multilingual | 3 | 86.1% | 2026-05-28 |
| MMMLU | Multilingual | 4 | 87.9% | 2026-05-28 |
| Design Arena | Multimodal | 10 | 1313 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 47 | 50 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 48 | 50 | 2026-05-11 |
| CAIS Text Capabilities Index | Reasoning | 13 | 32.1 | 2026-05-27 |
| Context Arena | Reasoning | 18 | 55.99 | 2026-05-06 |
| Context Arena | Reasoning | 55 | 26.31 | 2026-05-06 |
| Global PIQA | Reasoning | 3 | 90.5% | 2026-05-28 |
| GPQA Diamond | Reasoning | 5 | 90.1% | 2026-05-28 |
| GPQA Diamond | Reasoning | 12 | 90.5% | 2026-05-11 |
| GPQA Diamond | Reasoning | 20 | 88.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 189 | 71.7% | 2026-05-11 |
| CAIS Risk Index | Safety | 21 | 54.1 | 2026-05-27 |
| CritPt | Science | 1 | 12.9% | 2026-05-28 |
| CritPt | Science | 10 | 12.9% | 2026-05-11 |
| CritPt | Science | 15 | 10% | 2026-05-11 |
| CritPt | Science | 94 | 0.9% | 2026-05-11 |
| SWE-bench Multilingual | Software Engineering | 4 | 76.2% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 3 | 59% | 2026-05-28 |
| SWE-bench Verified | Software Engineering | 2 | 80.6% | 2026-05-28 |
| SpreadsheetBench | Spreadsheets | 4 | 84.9% | 2026-05-28 |
| Structured Output Benchmark | Structured Output | 13 | 85.30 | 2026-05-06 |
| BFCL-V4 | Tool Use | 5 | 70.6% | 2026-05-28 |
| WMT24++ | Translation | 4 | 82.2% | 2026-05-28 |
No matching rows.