R1 0528

DeepSeek / DeepSeek

32scores
31benchmarks
$0.5 / $2.15 per 1M tokenscost in/out

Metadata

DeepSeek Open source

Aliases: deepseek-deepseek-r1-0528, deepseek-r1-0528, deepseek/deepseek-r1-0528

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 109 21.21 2026-05-05
ARC-AGI-2 Agentic 115 1.12 2026-05-05
AgentBench FC Agents 14 49.30 2026-05-06
ALE-Bench Coding 39 804.13 2026-05-06
Codeforces Coding 14 0.6433 2026-05-28
LiveCodeBench Coding 5 73.10 2026-05-06
TuRTLe Code Completion (Icarus Verilog) Coding 4 78.86 2026-05-06
TuRTLe Code Completion (Verilator) Coding 4 78.08 2026-05-06
TuRTLe Module Completion (NotSoTiny) Coding 5 20.73 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 4 76.79 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 4 75.83 2026-05-06
SecCodeBench Cybersecurity 18 54.06% 2026-05-28
MMTU Data 8 0.58 2026-05-06
GSMA Open Telco Leaderboard Domain 62 43.37 2026-05-06
kluster.ai LLM Hallucination Detection Leaderboard Factuality 3 98.48 2026-05-06
FinanceArena Finance 10 42.9 2026-05-27
PRBench Finance Finance 26 32.67 2026-05-06
MMLU-Redux General Knowledge 8 0.93 2026-05-06
HELM Safety Generalization 39 0.894417 2026-05-28
LongBench v2 Generalization 7 56.7% 2026-05-27
HUMAINE Human Preference 1 3.79 2026-05-06
Professional Reasoning Bench - Legal Legal 22 36.61 2026-05-06
AIME 2024 Math 3 91.4 2026-05-27
IneqMath Math 18 9.50 2026-05-06
IneqMath Math 32 4.50 2026-05-06
HMMT 2025 Mathematics 24 0.79 2026-05-06
LanguageBench Multilingual 31 0.12 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 20 14.04 2026-05-06
MultiNRC Reasoning 21 27.58 2026-05-06
LiveSecBench Safety 20 55.22 2026-05-27
BrowseComp-zh Search 13 0.36 2026-05-06
IDE-Bench Software Engineering 11 20 2026-05-27