DeepSeek V3
DeepSeek / DeepSeek
74scores
74benchmarks
$0.32 / $0.89 per 1M tokenscost in/out
Metadata
DeepSeek Open source
Aliases: deepseek-chat, deepseek-chat-v3, deepseek-deepseek-chat, deepseek-deepseek-chat-v3, deepseek/deepseek-chat, deepseek/deepseek-chat-v3
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ADBench | Agentic | 6 | 80 | 2026-05-06 |
| AgentIF | Agentic | 7 | 56.7 | 2026-05-27 |
| Galileo Agent Leaderboard | Agentic | 12 | 0.40 | 2026-05-06 |
| MCP-Universe | Agentic | 28 | 14.29 | 2026-05-06 |
| MCPMark | Agentic | 27 | 0.17 | 2026-05-06 |
| PinchBench | Agentic | 54 | 0.72 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 295 | 22.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 233 | 6.8% | 2026-05-11 |
| AgentBench FC | Agents | 23 | 36.10 | 2026-05-06 |
| TextClass Benchmark | Classification | 13 | 1732.54 | 2026-05-06 |
| BigCodeBench | Coding | 2 | 50 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 21 | 28.40 | 2026-05-05 |
| EvalPlus | Coding | 4 | 79.80 | 2026-05-05 |
| HumanEval-Mul | Coding | 1 | 0.83 | 2026-05-06 |
| HumanEval+ | Coding | 5 | 86.60 | 2026-05-05 |
| LiveCodeBench | Coding | 27 | 27.20 | 2026-05-06 |
| MBPP+ | Coding | 10 | 73 | 2026-05-05 |
| SciCode | Coding | 188 | 35.4% | 2026-05-11 |
| EduGuardBench | Education | 4 | 0.73 | 2026-05-27 |
| K-12EduBench | Education | 2 | 79.67 | 2026-05-27 |
| BizFinBench | Finance | 4 | 71.57 | 2026-05-27 |
| CorpFin v2 | Finance | 77 | 52.486% | 2026-05-28 |
| Fin-RATE | Finance | 14 | 9.81% | 2026-05-28 |
| Open FinLLM Leaderboard | Finance | 9 | 29.494986% | 2026-05-27 |
| TaxEval v2 | Finance | 76 | 67.907% | 2026-05-28 |
| Xent Games | Game | 11 | 35.48 overall | 2026-05-28 |
| BenchLM | General Knowledge | 82 | 36 | 2026-05-06 |
| CSimpleQA | General Knowledge | 7 | 0.65 | 2026-05-06 |
| MMLU-Redux | General Knowledge | 24 | 0.89 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 80 | 0.407885 | 2026-05-28 |
| HELM Safety | Generalization | 45 | 0.871772 | 2026-05-28 |
| WeirdML | Generalization | 12 | 41.63 | 2026-05-06 |
| MedAgentBench | Healthcare | 3 | 62.67% | 2026-05-27 |
| MedQA | Healthcare | 71 | 80.9% | 2026-04-16 |
| Artificial Analysis Intelligence Index | Intelligence | 293 | 16.46 | 2026-05-11 |
| GPQA Diamond | Intelligence | 89 | 54.546% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 450 | 3.6% | 2026-05-11 |
| MMLU Pro | Intelligence | 87 | 73.82% | 2026-05-28 |
| MMLU-Pro | Intelligence | 173 | 75.2% | 2026-05-11 |
| HellaSwag | Language | 4 | 88.90 | 2026-05-06 |
| OpenHuEval | Language | 3 | 57.10 | 2026-05-06 |
| PIQA | Language | 6 | 84.70 | 2026-05-06 |
| WinoGrande | Language | 5 | 86.30 | 2026-05-06 |
| LegalBench | Legal | 51 | 80.762% | 2026-05-28 |
| LEXam | Legal | 14 | 52.53% open / 46.57% MCQ | 2026-05-28 |
| ConStory-Bench | Long Context | 28 | CED 2.422 | 2026-05-28 |
| Fiction.LiveBench | Long Context | 12 | 53.10 | 2026-05-06 |
| AIME | Math | 74 | 27.5% | 2026-04-16 |
| AIME 2025 | Math | 195 | 26% | 2026-05-11 |
| MATH 500 | Math | 38 | 80.4% | 2026-01-09 |
| MGSM | Math | 23 | 92.146% | 2026-01-09 |
| CNMO 2024 | Mathematics | 3 | 0.43 | 2026-05-06 |
| FrontierMath 2025-02-28 Private | Mathematics | 5 | 22.10 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 7 | 2.10 | 2026-05-06 |
| MATH-500 | Mathematics | 27 | 0.90 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 4 | 87.82 | 2026-05-06 |
| LanguageBench | Multilingual | 8 | 0.64 | 2026-05-06 |
| Design Arena | Multimodal | 77 | 1166 | 2026-05-06 |
| Balrog | Reasoning | 11 | 19.50 | 2026-05-06 |
| BBH | Reasoning | 1 | 87.50 | 2026-05-06 |
| CLUEWSC | Reasoning | 2 | 0.91 | 2026-05-06 |
| DROP | Reasoning | 1 | 0.92 | 2026-05-06 |
| GPQA Diamond | Reasoning | 310 | 55.7% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 47 | 4.55 | 2026-05-06 |
| SimpleBench | Reasoning | 9 | 40.80 | 2026-05-06 |
| ZebraLogic | Reasoning | 11 | 42.10 | 2026-05-06 |
| CritPt | Science | 169 | 0% | 2026-05-11 |
| SciPredict | Science | 6 | 19.18 | 2026-05-06 |
| FRAMES | Search | 2 | 0.73 | 2026-05-06 |
| Defects4J | Software Engineering | 11 | 0.399 | 2026-05-27 |
| RepairBench | Software Engineering | 11 | 0.371 | 2026-05-27 |
| SWE-PRBench | Software Engineering | 3 | 0.15 | 2026-05-27 |
| LiveSQLBench | Text to SQL | 24 | 23.68 | 2026-05-06 |
| Lech Mazur Writing | Writing | 7 | 8.52 | 2026-05-06 |
No matching rows.