DeepSeek V3

DeepSeek / DeepSeek

74scores
74benchmarks
$0.32 / $0.89 per 1M tokenscost in/out

Metadata

DeepSeek Open source

Aliases: deepseek-chat, deepseek-chat-v3, deepseek-deepseek-chat, deepseek-deepseek-chat-v3, deepseek/deepseek-chat, deepseek/deepseek-chat-v3

Benchmark Results

Benchmark Category Rank Score Sampled
ADBench Agentic 6 80 2026-05-06
AgentIF Agentic 7 56.7 2026-05-27
Galileo Agent Leaderboard Agentic 12 0.40 2026-05-06
MCP-Universe Agentic 28 14.29 2026-05-06
MCPMark Agentic 27 0.17 2026-05-06
PinchBench Agentic 54 0.72 2026-05-06
Tau2-Bench Telecom Agentic 295 22.8% 2026-05-11
Terminal-Bench Hard Agentic 233 6.8% 2026-05-11
AgentBench FC Agents 23 36.10 2026-05-06
TextClass Benchmark Classification 13 1732.54 2026-05-06
BigCodeBench Coding 2 50 2026-05-06
BigCodeBench-Hard Coding 21 28.40 2026-05-05
EvalPlus Coding 4 79.80 2026-05-05
HumanEval-Mul Coding 1 0.83 2026-05-06
HumanEval+ Coding 5 86.60 2026-05-05
LiveCodeBench Coding 27 27.20 2026-05-06
MBPP+ Coding 10 73 2026-05-05
SciCode Coding 188 35.4% 2026-05-11
EduGuardBench Education 4 0.73 2026-05-27
K-12EduBench Education 2 79.67 2026-05-27
BizFinBench Finance 4 71.57 2026-05-27
CorpFin v2 Finance 77 52.486% 2026-05-28
Fin-RATE Finance 14 9.81% 2026-05-28
Open FinLLM Leaderboard Finance 9 29.494986% 2026-05-27
TaxEval v2 Finance 76 67.907% 2026-05-28
Xent Games Game 11 35.48 overall 2026-05-28
BenchLM General Knowledge 82 36 2026-05-06
CSimpleQA General Knowledge 7 0.65 2026-05-06
MMLU-Redux General Knowledge 24 0.89 2026-05-06
HELM AIR-Bench Generalization 80 0.407885 2026-05-28
HELM Safety Generalization 45 0.871772 2026-05-28
WeirdML Generalization 12 41.63 2026-05-06
MedAgentBench Healthcare 3 62.67% 2026-05-27
MedQA Healthcare 71 80.9% 2026-04-16
Artificial Analysis Intelligence Index Intelligence 293 16.46 2026-05-11
GPQA Diamond Intelligence 89 54.546% 2026-05-28
Humanity's Last Exam Intelligence 450 3.6% 2026-05-11
MMLU Pro Intelligence 87 73.82% 2026-05-28
MMLU-Pro Intelligence 173 75.2% 2026-05-11
HellaSwag Language 4 88.90 2026-05-06
OpenHuEval Language 3 57.10 2026-05-06
PIQA Language 6 84.70 2026-05-06
WinoGrande Language 5 86.30 2026-05-06
LegalBench Legal 51 80.762% 2026-05-28
LEXam Legal 14 52.53% open / 46.57% MCQ 2026-05-28
ConStory-Bench Long Context 28 CED 2.422 2026-05-28
Fiction.LiveBench Long Context 12 53.10 2026-05-06
AIME Math 74 27.5% 2026-04-16
AIME 2025 Math 195 26% 2026-05-11
MATH 500 Math 38 80.4% 2026-01-09
MGSM Math 23 92.146% 2026-01-09
CNMO 2024 Mathematics 3 0.43 2026-05-06
FrontierMath 2025-02-28 Private Mathematics 5 22.10 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 7 2.10 2026-05-06
MATH-500 Mathematics 27 0.90 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 4 87.82 2026-05-06
LanguageBench Multilingual 8 0.64 2026-05-06
Design Arena Multimodal 77 1166 2026-05-06
Balrog Reasoning 11 19.50 2026-05-06
BBH Reasoning 1 87.50 2026-05-06
CLUEWSC Reasoning 2 0.91 2026-05-06
DROP Reasoning 1 0.92 2026-05-06
GPQA Diamond Reasoning 310 55.7% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 47 4.55 2026-05-06
SimpleBench Reasoning 9 40.80 2026-05-06
ZebraLogic Reasoning 11 42.10 2026-05-06
CritPt Science 169 0% 2026-05-11
SciPredict Science 6 19.18 2026-05-06
FRAMES Search 2 0.73 2026-05-06
Defects4J Software Engineering 11 0.399 2026-05-27
RepairBench Software Engineering 11 0.371 2026-05-27
SWE-PRBench Software Engineering 3 0.15 2026-05-27
LiveSQLBench Text to SQL 24 23.68 2026-05-06
Lech Mazur Writing Writing 7 8.52 2026-05-06