R1
DeepSeek / DeepSeek
94scores
79benchmarks
$0.7 / $2.5 per 1M tokenscost in/out
Metadata
DeepSeek Open source
Aliases: deepseek-deepseek-r1, deepseek-r1, deepseek/deepseek-r1
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AgentIF | Agentic | 5 | 57.9 | 2026-05-27 |
| ARC-AGI-1 | Agentic | 119 | 15.80 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 107 | 1.30 | 2026-05-05 |
| LLM-WikiRace | Agentic | 7 | 54.70 | 2026-05-06 |
| t2-bench | Agentic | 11 | 0.80 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 210 | 36.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 366 | 11.4% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 172 | 15.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 250 | 6.1% | 2026-05-11 |
| Toolathlon | Agentic | 15 | 0.35 | 2026-05-06 |
| OpenUGI | Alignment | 91 | 51 | 2026-05-06 |
| TextClass Benchmark | Classification | 16 | 1718.73 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 14 | 29.70 | 2026-05-05 |
| LiveCodeBench | Coding | 62 | 70.221% | 2026-05-28 |
| Long Code Arena | Coding | 3 | 0.80 | 2026-05-06 |
| SciCode | Coding | 94 | 40.3% | 2026-05-11 |
| SciCode | Coding | 185 | 35.7% | 2026-05-11 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 6 | 77.00 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 5 | 75.99 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 5 | 75.53 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 5 | 75.78 | 2026-05-06 |
| IslamicLegalBench | Domain | 8 | 54.21 | 2026-05-06 |
| EduGuardBench | Education | 2 | 0.75 | 2026-05-27 |
| K-12EduBench | Education | 10 | 69.13 | 2026-05-27 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 69 | 88.70 | 2026-05-06 |
| BizFinBench | Finance | 2 | 73.05 | 2026-05-27 |
| CorpFin v2 | Finance | 72 | 54.118% | 2026-05-28 |
| Fin-RATE | Finance | 11 | 15.53% | 2026-05-28 |
| FinChain | Finance | 17 | 53.75 ChainEval | 2026-05-28 |
| TaxEval v2 | Finance | 43 | 72.281% | 2026-05-28 |
| Xent Games | Game | 4 | 62.67 overall | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 12 | 36.98 | 2026-05-06 |
| BenchLM | General Knowledge | 86 | 33 | 2026-05-06 |
| Arena-Hard | Generalization | 10 | 58.0% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 66 | 0.529066 | 2026-05-28 |
| HELM Safety | Generalization | 46 | 0.868314 | 2026-05-28 |
| HELM Safety | Generalization | 47 | 0.865442 | 2026-05-28 |
| LongBench v2 | Generalization | 4 | 58.3% | 2026-05-27 |
| WeirdML | Generalization | 18 | 36.49 | 2026-05-06 |
| HealthBench Hard | Healthcare | 10 | 0.49 | 2026-05-27 |
| HELM MedQA | Healthcare | 9 | 0.856859 | 2026-05-28 |
| MedQA | Healthcare | 44 | 90.8% | 2026-04-16 |
| Artificial Analysis Intelligence Index | Intelligence | 174 | 27.07 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 253 | 18.84 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 96 | 14.9% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 167 | 9.3% | 2026-05-11 |
| MMLU Pro | Intelligence | 47 | 83.184% | 2026-05-28 |
| MMLU-Pro | Intelligence | 35 | 84.9% | 2026-05-11 |
| MMLU-Pro | Intelligence | 37 | 84.4% | 2026-05-11 |
| SuperGPQA | Intelligence | 1 | 61.82 | 2026-05-06 |
| OpenHuEval | Language | 2 | 62.31 | 2026-05-06 |
| J1-ENVS | Legal | 13 | 43.48 | 2026-05-26 |
| LegalBench | Legal | 95 | 67.323% | 2026-05-28 |
| LEXam | Legal | 11 | 55.91% open / 52.41% MCQ | 2026-05-28 |
| ConStory-Bench | Long Context | 31 | CED 3.419 | 2026-05-28 |
| Fiction.LiveBench | Long Context | 21 | 33.30 | 2026-05-06 |
| AIME | Math | 52 | 73.958% | 2026-04-16 |
| AIME 2025 | Math | 78 | 76% | 2026-05-11 |
| AIME 2025 | Math | 101 | 68% | 2026-05-11 |
| IneqMath | Math | 30 | 5 | 2026-05-06 |
| IneqMath | Math | 31 | 5 | 2026-05-06 |
| IneqMath | Math | 35 | 3.50 | 2026-05-06 |
| IneqMath | Math | 51 | 0.50 | 2026-05-06 |
| MATH 500 | Math | 18 | 92.2% | 2026-01-09 |
| MGSM | Math | 20 | 92.254% | 2026-01-09 |
| HMMT 2025 | Mathematics | 16 | 0.90 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 19 | 53.33 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 9 | 51.38 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 55 | 44.25 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 75 | 42.1 | 2026-05-27 |
| LiveMedBench | Medical | 17 | 0.1329 | 2026-05-27 |
| MedHELM | Medical | 1 | 0.6625 | 2026-05-27 |
| MEDIC Benchmark | Medical | 92 | 35.5 average normalized public table score | 2026-05-27 |
| LanguageBench | Multilingual | 28 | 0.17 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 13 | 35.21 | 2026-05-06 |
| Math-VR | Multimodal | 12 | 49.5 | 2026-05-27 |
| Artificial Analysis Openness Index | Openness | 44 | 50 | 2026-05-11 |
| Balrog | Reasoning | 3 | 34.90 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 35 | 8.6 | 2026-05-27 |
| GPQA Diamond | Reasoning | 90 | 81.3% | 2026-05-11 |
| GPQA Diamond | Reasoning | 198 | 70.8% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 31 | 8.54 | 2026-05-06 |
| LingOly-TOO | Reasoning | 9 | 0.26 | 2026-05-06 |
| MultiNRC | Reasoning | 22 | 24.27 | 2026-05-06 |
| SimpleBench | Reasoning | 14 | 30.90 | 2026-05-06 |
| ZebraLogic | Reasoning | 4 | 78.70 | 2026-05-06 |
| CAIS Risk Index | Safety | 26 | 57.4 | 2026-05-27 |
| CritPt | Science | 65 | 1.4% | 2026-05-11 |
| CritPt | Science | 106 | 0.6% | 2026-05-11 |
| BrowseComp-zh | Search | 6 | 0.65 | 2026-05-06 |
| Defects4J | Software Engineering | 4 | 0.475 | 2026-05-27 |
| RepairBench | Software Engineering | 3 | 0.452 | 2026-05-27 |
| LiveSQLBench | Text to SQL | 18 | 26.90 | 2026-05-06 |
| Lech Mazur Writing | Writing | 10 | 8.30 | 2026-05-06 |
No matching rows.