DeepSeek V3.1 | BenchmarkList

Metadata

DeepSeek Open source

Aliases: deepseek-chat-v3.1, deepseek-deepseek-chat-v3.1, deepseek/deepseek-chat-v3.1

Benchmark	Category	Rank	Score	Sampled
MCP-Universe	Agentic	17	22.08	2026-05-06
Tau2-Bench Telecom	Agentic	207	37.4%	2026-05-11
Tau2-Bench Telecom	Agentic	216	34.8%	2026-05-11
Terminal-Bench Hard	Agentic	118	25%	2026-05-11
Terminal-Bench Hard	Agentic	123	24.2%	2026-05-11
Codeforces	Coding	12	0.697	2026-05-28
SciCode	Coding	117	39.1%	2026-05-11
SciCode	Coding	159	36.7%	2026-05-11
FinChain	Finance	12	56.76 ChainEval	2026-05-28
BenchLM	General Knowledge	90	30	2026-05-06
MMLU-Redux	General Knowledge	18	0.92	2026-05-06
AIIQ Composite IQ	Intelligence	34	95	2026-05-12
Artificial Analysis Intelligence Index	Intelligence	163	28.13	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	166	27.71	2026-05-11
Humanity's Last Exam	Intelligence	112	13%	2026-05-11
Humanity's Last Exam	Intelligence	231	6.3%	2026-05-11
MMLU-Pro	Intelligence	30	85.1%	2026-05-11
MMLU-Pro	Intelligence	56	83.3%	2026-05-11
AIME 2025	Math	26	89.7%	2026-05-11
AIME 2025	Math	141	49.7%	2026-05-11
IneqMath	Math	13	15.50	2026-05-06
IneqMath	Math	16	12	2026-05-06
HMMT 2025	Mathematics	31	0.34	2026-05-06
LiveMedBench	Medical	24	0.0959	2026-05-27
GPQA Diamond	Reasoning	123	77.9%	2026-05-11
GPQA Diamond	Reasoning	172	73.5%	2026-05-11
CritPt	Science	58	2%	2026-05-11
CritPt	Science	171	0%	2026-05-11
BrowseComp-zh	Search	10	0.49	2026-05-06