DeepSeek V3.2
DeepSeek / DeepSeek
90scores
63benchmarks
$0.252 / $0.378 per 1M tokenscost in/out
Metadata
DeepSeek Open source
Aliases: deepseek-deepseek-v3.2, deepseek-deepseek-v3.2-20251201, deepseek-v3.2, deepseek-v3.2-20251201, deepseek/deepseek-v3.2, deepseek/deepseek-v3.2-20251201
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 30 | 18.80 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 10 | 14.5% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 58 | 57 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 76 | 4.03 | 2026-05-05 |
| AutoBench | Agentic | 29 | 2.64 | 2026-05-06 |
| Claw-Eval-Live | Agentic | 9 | 51.4 | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 12 | 23.8% | 2026-05-05 |
| EnterpriseOps-Gym | Agentic | 16 | 21.8% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 43 | 0.40 | 2026-05-11 |
| MCPMark | Agentic | 8 | 0.37 | 2026-05-06 |
| PinchBench | Agentic | 30 | 0.84 | 2026-05-06 |
| t2-bench | Agentic | 9 | 0.80 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 53 | 90.6% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 112 | 78.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 223 | 33.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 55 | 35.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 78 | 32.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 119 | 25% | 2026-05-11 |
| Toolathlon | Agentic | 15 | 0.35 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 27 | 1034.00 | 2026-05-28 |
| VitaBench | Agentic | 7 | 24 | 2026-05-06 |
| VitaBench | Agentic | 18 | 18.50 | 2026-05-06 |
| WildClawBench | Agentic | 7 | 34 | 2026-05-06 |
| YC-Bench | Agentic | 8 | 125263 | 2026-05-06 |
| OpenUGI | Alignment | 22 | 57.78 | 2026-05-06 |
| OpenUGI | Alignment | 37 | 55.75 | 2026-05-06 |
| ABC-Bench | Coding | 2 | 50.1% +/- 1.9 | 2026-05-27 |
| Arena AI Code | Coding | 43 | 1368 | 2026-05-06 |
| Arena AI Code | Coding | 51 | 1332 | 2026-05-06 |
| Codeforces | Coding | 8 | 0.795 | 2026-05-28 |
| Codeforces | Coding | 8 | 0.795 | 2026-05-28 |
| SciCode | Coding | 98 | 39.9% | 2026-05-11 |
| SciCode | Coding | 121 | 38.9% | 2026-05-11 |
| SciCode | Coding | 125 | 38.7% | 2026-05-11 |
| VibeCodingBench | Coding | 8 | 88.19 | 2026-05-06 |
| RP-Bench | Creative | 2 | 1638.30 | 2026-05-06 |
| RP-Bench | Creative | 14 | 1478.80 | 2026-05-06 |
| RP-Bench | Creative | 21 | 4.34 | 2026-05-06 |
| OrgForge-IT | Cybersecurity | 5 | 0.800 | 2026-05-28 |
| SecCodeBench | Cybersecurity | 15 | 55.24% | 2026-05-28 |
| SecCodeBench | Cybersecurity | 22 | 51.81% | 2026-05-28 |
| AA-Omniscience | Factuality | 18 | -20.88 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 27 | 93.70 | 2026-05-06 |
| Fin-RATE | Finance | 10 | 16.32% | 2026-05-28 |
| FinChain | Finance | 14 | 56.71 ChainEval | 2026-05-28 |
| QuantSightBench | Finance | 11 | 0.6148 coverage | 2026-05-28 |
| React Native Evals | Frontend Development | 15 | 71.4827% overall | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 21 | 1114.99 Elo / 110 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 12 | 1292.95 Elo / 111 games | 2026-05-28 |
| MageBench Season 1 | Game | 5 | 1682 rating / 10 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 16 | 35.43 | 2026-05-06 |
| BenchLM | General Knowledge | 47 | 62 | 2026-05-06 |
| BenchLM | General Knowledge | 50 | 58 | 2026-05-06 |
| HUMAINE | Human Preference | 34 | 3.39 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 23 | 111 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 64 | 41.71 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 131 | 32.09 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 159 | 28.44 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 60 | 22.2% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 145 | 10.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 178 | 8.6% | 2026-05-11 |
| LiveBench | Intelligence | 48 | 63.13 | 2026-05-05 |
| MMLU-Pro | Intelligence | 18 | 86.2% | 2026-05-11 |
| MMLU-Pro | Intelligence | 48 | 83.7% | 2026-05-11 |
| MMLU-Pro | Intelligence | 52 | 83.6% | 2026-05-11 |
| AIME 2025 | Math | 17 | 92% | 2026-05-11 |
| AIME 2025 | Math | 118 | 59% | 2026-05-11 |
| AIME 2025 | Math | 122 | 57.7% | 2026-05-11 |
| AIME 2026 | Mathematics | 4 | 94.17 | 2026-05-06 |
| HMMT 2025 | Mathematics | 16 | 0.90 | 2026-05-06 |
| HMMT February 2026 | Mathematics | 6 | 84.09 | 2026-05-06 |
| IMO-AnswerBench | Mathematics | 14 | 0.78 | 2026-05-06 |
| LiveMedBench | Medical | 22 | 0.1028 | 2026-05-27 |
| ALL Bench Multimodal | Multimodal | 11 | 38.42 | 2026-05-06 |
| WebMainBench | Multimodal | 1 | 0.91 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 77 | 44.44 | 2026-05-11 |
| CAIS Text Capabilities Index | Reasoning | 24 | 20.3 | 2026-05-27 |
| FINAL Bench Metacognitive | Reasoning | 8 | 73.08 | 2026-05-06 |
| GPQA Diamond | Reasoning | 67 | 84% | 2026-05-11 |
| GPQA Diamond | Reasoning | 158 | 75.1% | 2026-05-11 |
| GPQA Diamond | Reasoning | 168 | 73.8% | 2026-05-11 |
| CAIS Risk Index | Safety | 24 | 56.9 | 2026-05-27 |
| EvasionBench | Safety | 4 | 66.88 | 2026-05-06 |
| LiveSecBench | Safety | 19 | 56.2 | 2026-05-27 |
| CritPt | Science | 47 | 2.9% | 2026-05-11 |
| CritPt | Science | 66 | 1.4% | 2026-05-11 |
| CritPt | Science | 93 | 0.9% | 2026-05-11 |
| BrowseComp-zh | Search | 6 | 0.65 | 2026-05-06 |
| IDE-Bench | Software Engineering | 10 | 31.25 | 2026-05-27 |
| SWE-bench Pro | Software Engineering | 9 | 15.56 | 2026-05-06 |
No matching rows.