R1

DeepSeek / DeepSeek

94scores
79benchmarks
$0.7 / $2.5 per 1M tokenscost in/out

Metadata

DeepSeek Open source

Aliases: deepseek-deepseek-r1, deepseek-r1, deepseek/deepseek-r1

Benchmark Results

Benchmark Category Rank Score Sampled
AgentIF Agentic 5 57.9 2026-05-27
ARC-AGI-1 Agentic 119 15.80 2026-05-05
ARC-AGI-2 Agentic 107 1.30 2026-05-05
LLM-WikiRace Agentic 7 54.70 2026-05-06
t2-bench Agentic 11 0.80 2026-05-06
Tau2-Bench Telecom Agentic 210 36.5% 2026-05-11
Tau2-Bench Telecom Agentic 366 11.4% 2026-05-11
Terminal-Bench Hard Agentic 172 15.9% 2026-05-11
Terminal-Bench Hard Agentic 250 6.1% 2026-05-11
Toolathlon Agentic 15 0.35 2026-05-06
OpenUGI Alignment 91 51 2026-05-06
TextClass Benchmark Classification 16 1718.73 2026-05-06
BigCodeBench-Hard Coding 14 29.70 2026-05-05
LiveCodeBench Coding 62 70.221% 2026-05-28
Long Code Arena Coding 3 0.80 2026-05-06
SciCode Coding 94 40.3% 2026-05-11
SciCode Coding 185 35.7% 2026-05-11
TuRTLe Code Completion (Icarus Verilog) Coding 6 77.00 2026-05-06
TuRTLe Code Completion (Verilator) Coding 5 75.99 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 5 75.53 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 5 75.78 2026-05-06
IslamicLegalBench Domain 8 54.21 2026-05-06
EduGuardBench Education 2 0.75 2026-05-27
K-12EduBench Education 10 69.13 2026-05-27
Vectara HHEM Hallucination Leaderboard Factuality 69 88.70 2026-05-06
BizFinBench Finance 2 73.05 2026-05-27
CorpFin v2 Finance 72 54.118% 2026-05-28
Fin-RATE Finance 11 15.53% 2026-05-28
FinChain Finance 17 53.75 ChainEval 2026-05-28
TaxEval v2 Finance 43 72.281% 2026-05-28
Xent Games Game 4 62.67 overall 2026-05-28
ALL Bench LLM General Knowledge 12 36.98 2026-05-06
BenchLM General Knowledge 86 33 2026-05-06
Arena-Hard Generalization 10 58.0% 2026-05-27
HELM AIR-Bench Generalization 66 0.529066 2026-05-28
HELM Safety Generalization 46 0.868314 2026-05-28
HELM Safety Generalization 47 0.865442 2026-05-28
LongBench v2 Generalization 4 58.3% 2026-05-27
WeirdML Generalization 18 36.49 2026-05-06
HealthBench Hard Healthcare 10 0.49 2026-05-27
HELM MedQA Healthcare 9 0.856859 2026-05-28
MedQA Healthcare 44 90.8% 2026-04-16
Artificial Analysis Intelligence Index Intelligence 174 27.07 2026-05-11
Artificial Analysis Intelligence Index Intelligence 253 18.84 2026-05-11
Humanity's Last Exam Intelligence 96 14.9% 2026-05-11
Humanity's Last Exam Intelligence 167 9.3% 2026-05-11
MMLU Pro Intelligence 47 83.184% 2026-05-28
MMLU-Pro Intelligence 35 84.9% 2026-05-11
MMLU-Pro Intelligence 37 84.4% 2026-05-11
SuperGPQA Intelligence 1 61.82 2026-05-06
OpenHuEval Language 2 62.31 2026-05-06
J1-ENVS Legal 13 43.48 2026-05-26
LegalBench Legal 95 67.323% 2026-05-28
LEXam Legal 11 55.91% open / 52.41% MCQ 2026-05-28
ConStory-Bench Long Context 31 CED 3.419 2026-05-28
Fiction.LiveBench Long Context 21 33.30 2026-05-06
AIME Math 52 73.958% 2026-04-16
AIME 2025 Math 78 76% 2026-05-11
AIME 2025 Math 101 68% 2026-05-11
IneqMath Math 30 5 2026-05-06
IneqMath Math 31 5 2026-05-06
IneqMath Math 35 3.50 2026-05-06
IneqMath Math 51 0.50 2026-05-06
MATH 500 Math 18 92.2% 2026-01-09
MGSM Math 20 92.254% 2026-01-09
HMMT 2025 Mathematics 16 0.90 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 19 53.33 2026-05-06
BRIDGE Medical Leaderboard Medical 9 51.38 2026-05-27
BRIDGE Medical Leaderboard Medical 55 44.25 2026-05-27
BRIDGE Medical Leaderboard Medical 75 42.1 2026-05-27
LiveMedBench Medical 17 0.1329 2026-05-27
MedHELM Medical 1 0.6625 2026-05-27
MEDIC Benchmark Medical 92 35.5 average normalized public table score 2026-05-27
LanguageBench Multilingual 28 0.17 2026-05-06
ALL Bench Multimodal Multimodal 13 35.21 2026-05-06
Math-VR Multimodal 12 49.5 2026-05-27
Artificial Analysis Openness Index Openness 44 50 2026-05-11
Balrog Reasoning 3 34.90 2026-05-06
CAIS Text Capabilities Index Reasoning 35 8.6 2026-05-27
GPQA Diamond Reasoning 90 81.3% 2026-05-11
GPQA Diamond Reasoning 198 70.8% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 31 8.54 2026-05-06
LingOly-TOO Reasoning 9 0.26 2026-05-06
MultiNRC Reasoning 22 24.27 2026-05-06
SimpleBench Reasoning 14 30.90 2026-05-06
ZebraLogic Reasoning 4 78.70 2026-05-06
CAIS Risk Index Safety 26 57.4 2026-05-27
CritPt Science 65 1.4% 2026-05-11
CritPt Science 106 0.6% 2026-05-11
BrowseComp-zh Search 6 0.65 2026-05-06
Defects4J Software Engineering 4 0.475 2026-05-27
RepairBench Software Engineering 3 0.452 2026-05-27
LiveSQLBench Text to SQL 18 26.90 2026-05-06
Lech Mazur Writing Writing 10 8.30 2026-05-06