KIMI

MoonshotAI: Kimi K2.5

Kimi / Moonshot AI

120scores
104benchmarks
$0.44 / $2 per 1M tokenscost in/out

Metadata

Kimi Closed/API

Aliases: kimi-k2.5, kimi-k2.5-0127, moonshotai-kimi-k2.5, moonshotai-kimi-k2.5-0127, moonshotai/kimi-k2.5, moonshotai/kimi-k2.5-0127

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents Agentic 23 29.20 2026-05-06
APEX-Agents-AA Agentic 14 11.5% 2026-05-11
ARC-AGI-1 Agentic 46 65.33 2026-05-05
ARC-AGI-2 Agentic 49 11.81 2026-05-05
AutoBench Agentic 9 3.02 2026-05-06
AutoLab Agentic 6 0.55 2026-05-06
Claw-Eval-Live Agentic 7 53.3 2026-05-27
EnterpriseOps-Gym Agentic 11 26.2% 2026-05-05
Gert Labs Rankings Agentic 37 0.44 2026-05-11
MultiChallenge Agentic 5 61.39 2026-05-06
OSWorld Agentic 18 63.3% 2026-05-27
PinchBench Agentic 28 0.85 2026-05-06
RuneBench Agentic 14 2.10 2026-05-05
Tau2-Bench Telecom Agentic 13 95.9% 2026-05-11
Tau2-Bench Telecom Agentic 103 81.3% 2026-05-11
Terminal-Bench Hard Agentic 63 34.8% 2026-05-11
Terminal-Bench Hard Agentic 151 18.9% 2026-05-11
Vending-Bench 2 Agentic 25 1198.46 2026-05-28
WildClawBench Agentic 10 30.80 2026-05-06
YC-Bench Agentic 5 408822 2026-05-06
OpenUGI Alignment 38 55.23 2026-05-06
OpenUGI Alignment 253 44 2026-05-06
ALE-Bench Coding 37 821.65 2026-05-06
Arena AI Code Coding 24 1430 2026-05-06
Arena AI Code Coding 27 1408 2026-05-06
IOI Coding 18 17.667% 2026-05-26
LiveCodeBench Coding 27 83.868% 2026-05-28
LMArena WebDev Arena Coding 24 1429.73 2026-05-06
SciCode Coding 24 49% 2026-05-11
SciCode Coding 106 39.6% 2026-05-11
SWE Atlas - Codebase QnA Coding 8 13.10 2026-05-06
SWE Atlas - Refactoring Coding 9 20.95 2026-05-06
SWE Atlas - Test Writing Coding 5 25.77 2026-05-06
SWE-bench Verified Coding 31 70% 2026-05-28
Terminal-Bench 2.0 Coding 32 40.449% 2026-05-28
TuRTLe Code Completion (Icarus Verilog) Coding 2 83.38 2026-05-06
TuRTLe Code Completion (Verilator) Coding 2 81.65 2026-05-06
TuRTLe Module Completion (NotSoTiny) Coding 1 31.57 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 2 81.47 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 2 79.73 2026-05-06
Vibe Code Bench v1.1 Coding 28 17.536% 2026-05-28
CyberGym Cybersecurity 6 0.41 2026-05-06
SecCodeBench Cybersecurity 7 61.25% 2026-05-28
SecCodeBench Cybersecurity 16 55.22% 2026-05-28
OmniDocBench 1.5 Document Understanding 7 0.89 2026-05-06
Arena AI Document Document AI 14 1444 2026-05-06
GSMA Open Telco Leaderboard Domain 9 69.42 2026-05-06
SAGE Education 11 49.865% 2026-05-28
TutorBench Education 1 54.56 2026-05-06
From Perception to Action Embodied AI 5 13.8% 2026-05-28
Vectara HHEM Hallucination Leaderboard Factuality 84 85.80 2026-05-06
CorpFin v2 Finance 3 68.259% 2026-05-28
Finance Agent v1.1 Finance 28 50.622% 2026-05-04
MortgageTax Finance 23 66.534% 2026-05-28
PRBench Finance Finance 8 46.51 2026-05-06
TaxEval v2 Finance 23 74.202% 2026-05-28
React Native Evals Frontend Development 10 77.1795% overall 2026-05-28
MageBench Season 1 Game 8 1652 rating / 10 games 2026-05-28
ALL Bench LLM General Knowledge 2 60.81 2026-05-06
BenchLM General Knowledge 25 76 2026-05-06
BenchLM General Knowledge 43 64 2026-05-06
MedCode Healthcare 32 39.316% 2026-05-28
MedQA Healthcare 17 94.367% 2026-04-16
MedScribe Healthcare 35 76.442% 2026-05-28
HUMAINE Human Preference 23 3.55 2026-05-06
AIIQ Composite IQ Intelligence 14 118 2026-05-12
Artificial Analysis Intelligence Index Intelligence 34 46.81 2026-05-11
Artificial Analysis Intelligence Index Intelligence 93 37.27 2026-05-11
GPQA Diamond Intelligence 29 84.091% 2026-05-28
Humanity's Last Exam Intelligence 26 29.4% 2026-05-11
Humanity's Last Exam Intelligence 121 12.3% 2026-05-11
LiveBench Intelligence 32 69.16 2026-05-05
MathVision Intelligence 12 85 2026-05-06
MathVision Intelligence 14 84.20 2026-05-06
MMLU Pro Intelligence 30 85.914% 2026-05-28
MMMU Pro Intelligence 12 84.335% 2026-05-28
CaseLaw v2 Legal 28 58.735% 2026-05-04
Professional Reasoning Bench - Legal Legal 10 43.83 2026-05-06
AA-LCR Long Context 2 0.70 2026-05-06
LongVideoBench Long Context 1 0.80 2026-05-06
AIME Math 10 95.625% 2026-04-16
LiveMathematicianBench Math 5 35.0% 2026-05-28
AIME 2026 Mathematics 2 95.83 2026-05-06
HMMT 2025 Mathematics 6 0.95 2026-05-06
HMMT February 2026 Mathematics 2 87.12 2026-05-06
IMO-AnswerBench Mathematics 8 0.82 2026-05-06
ALL Bench Multimodal Multimodal 4 57.79 2026-05-06
CharXiv-R Multimodal 14 0.78 2026-05-06
Design Arena Multimodal 13 1302 2026-05-06
InfoVQAtest Multimodal 1 0.93 2026-05-06
LMArena Vision Arena Multimodal 15 1265.49 2026-05-06
LMArena Vision Arena Multimodal 20 1255.33 2026-05-06
LVBench Multimodal 1 0.76 2026-05-06
MMVU Multimodal 1 0.80 2026-05-06
SimpleVQA Multimodal 3 0.71 2026-05-06
Video-MME v2 Multimodal 1 61.10 2026-05-06
Video-MME v2 Multimodal 3 54.40 2026-05-06
VideoMMMU Multimodal 3 0.87 2026-05-06
Visual-Language Understanding Multimodal 20 41.86 2026-05-06
ZEROBench Multimodal 2 0.11 2026-05-06
Artificial Analysis Openness Index Openness 162 33.33 2026-05-11
Altered Riddles Reasoning 13 0.4319 2026-05-27
Altered Riddles Reasoning 23 0.5374 2026-05-27
CAIS Text Capabilities Index Reasoning 17 26.1 2026-05-27
Context Arena Reasoning 17 59.22 2026-05-06
Context Arena Reasoning 20 53.33 2026-05-06
EnigmaEval Reasoning 20 3.38 2026-05-06
FINAL Bench Metacognitive Reasoning 1 78.54 2026-05-06
GPQA Diamond Reasoning 27 87.9% 2026-05-11
GPQA Diamond Reasoning 113 78.9% 2026-05-11
MultiNRC Reasoning 17 35.17 2026-05-06
CAIS Risk Index Safety 33 61.5 2026-05-27
InvisibleBench Safety 7 0.05 2026-05-06
LiveSecBench Safety 7 74.79 2026-05-27
CritPt Science 45 3.1% 2026-05-11
CritPt Science 112 0.6% 2026-05-11
DeepSearchQA Search 4 0.77 2026-05-06
Seal-0 Search 1 0.57 2026-05-06
WideSearch Search 2 0.79 2026-05-06
SWE-bench Pro Software Engineering 2 50.70 2026-05-06