DeepSeek V3.1 Terminus

DeepSeek / DeepSeek

35scores
25benchmarks
$0.27 / $0.95 per 1M tokenscost in/out

Metadata

DeepSeek Open source

Aliases: deepseek-deepseek-v3.1-terminus, deepseek-v3.1-terminus, deepseek/deepseek-v3.1-terminus

Benchmark Results

Benchmark Category Rank Score Sampled
MCP-Universe Agentic 19 21.65 2026-05-06
MCPMark Agentic 28 0.17 2026-05-06
MultiChallenge Agentic 24 46.10 2026-05-06
Tau2-Bench Telecom Agentic 208 37.1% 2026-05-11
Tau2-Bench Telecom Agentic 209 37.1% 2026-05-11
Terminal-Bench Hard Agentic 83 31.8% 2026-05-11
Terminal-Bench Hard Agentic 96 30.3% 2026-05-11
UAVBench Agentic 19 72.70 2026-05-06
ALE-Bench Coding 47 745.17 2026-05-06
SciCode Coding 86 40.6% 2026-05-11
SciCode Coding 230 32.1% 2026-05-11
TuRTLe Code Completion (Icarus Verilog) Coding 7 76.57 2026-05-06
TuRTLe Code Completion (Verilator) Coding 6 75.31 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 6 71.79 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 6 71.35 2026-05-06
PRBench Finance Finance 23 35.09 2026-05-06
BenchLM General Knowledge 96 26 2026-05-06
Artificial Analysis Intelligence Index Intelligence 114 33.93 2026-05-11
Artificial Analysis Intelligence Index Intelligence 158 28.52 2026-05-11
Humanity's Last Exam Intelligence 94 15.2% 2026-05-11
Humanity's Last Exam Intelligence 181 8.4% 2026-05-11
MMLU-Pro Intelligence 31 85.1% 2026-05-11
MMLU-Pro Intelligence 51 83.6% 2026-05-11
Professional Reasoning Bench - Legal Legal 19 37.62 2026-05-06
AIME 2025 Math 27 89.7% 2026-05-11
AIME 2025 Math 134 53.7% 2026-05-11
Artificial Analysis Openness Index Openness 112 38.89 2026-05-11
Artificial Analysis Openness Index Openness 113 38.89 2026-05-11
GPQA Diamond Reasoning 108 79.2% 2026-05-11
GPQA Diamond Reasoning 157 75.1% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 20 12.88 2026-05-06
LingOly-TOO Reasoning 5 0.42 2026-05-06
MultiNRC Reasoning 23 23.60 2026-05-06
CritPt Science 61 1.7% 2026-05-11
CritPt Science 172 0% 2026-05-11