MoonshotAI: Kimi K2.5
Kimi / Moonshot AI
120scores
104benchmarks
$0.44 / $2 per 1M tokenscost in/out
Metadata
Kimi Closed/API
Aliases: kimi-k2.5, kimi-k2.5-0127, moonshotai-kimi-k2.5, moonshotai-kimi-k2.5-0127, moonshotai/kimi-k2.5, moonshotai/kimi-k2.5-0127
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 23 | 29.20 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 14 | 11.5% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 46 | 65.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 49 | 11.81 | 2026-05-05 |
| AutoBench | Agentic | 9 | 3.02 | 2026-05-06 |
| AutoLab | Agentic | 6 | 0.55 | 2026-05-06 |
| Claw-Eval-Live | Agentic | 7 | 53.3 | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 11 | 26.2% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 37 | 0.44 | 2026-05-11 |
| MultiChallenge | Agentic | 5 | 61.39 | 2026-05-06 |
| OSWorld | Agentic | 18 | 63.3% | 2026-05-27 |
| PinchBench | Agentic | 28 | 0.85 | 2026-05-06 |
| RuneBench | Agentic | 14 | 2.10 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 13 | 95.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 103 | 81.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 63 | 34.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 151 | 18.9% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 25 | 1198.46 | 2026-05-28 |
| WildClawBench | Agentic | 10 | 30.80 | 2026-05-06 |
| YC-Bench | Agentic | 5 | 408822 | 2026-05-06 |
| OpenUGI | Alignment | 38 | 55.23 | 2026-05-06 |
| OpenUGI | Alignment | 253 | 44 | 2026-05-06 |
| ALE-Bench | Coding | 37 | 821.65 | 2026-05-06 |
| Arena AI Code | Coding | 24 | 1430 | 2026-05-06 |
| Arena AI Code | Coding | 27 | 1408 | 2026-05-06 |
| IOI | Coding | 18 | 17.667% | 2026-05-26 |
| LiveCodeBench | Coding | 27 | 83.868% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 24 | 1429.73 | 2026-05-06 |
| SciCode | Coding | 24 | 49% | 2026-05-11 |
| SciCode | Coding | 106 | 39.6% | 2026-05-11 |
| SWE Atlas - Codebase QnA | Coding | 8 | 13.10 | 2026-05-06 |
| SWE Atlas - Refactoring | Coding | 9 | 20.95 | 2026-05-06 |
| SWE Atlas - Test Writing | Coding | 5 | 25.77 | 2026-05-06 |
| SWE-bench Verified | Coding | 31 | 70% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 32 | 40.449% | 2026-05-28 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 2 | 83.38 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 2 | 81.65 | 2026-05-06 |
| TuRTLe Module Completion (NotSoTiny) | Coding | 1 | 31.57 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 2 | 81.47 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 2 | 79.73 | 2026-05-06 |
| Vibe Code Bench v1.1 | Coding | 28 | 17.536% | 2026-05-28 |
| CyberGym | Cybersecurity | 6 | 0.41 | 2026-05-06 |
| SecCodeBench | Cybersecurity | 7 | 61.25% | 2026-05-28 |
| SecCodeBench | Cybersecurity | 16 | 55.22% | 2026-05-28 |
| OmniDocBench 1.5 | Document Understanding | 7 | 0.89 | 2026-05-06 |
| Arena AI Document | Document AI | 14 | 1444 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 9 | 69.42 | 2026-05-06 |
| SAGE | Education | 11 | 49.865% | 2026-05-28 |
| TutorBench | Education | 1 | 54.56 | 2026-05-06 |
| From Perception to Action | Embodied AI | 5 | 13.8% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 84 | 85.80 | 2026-05-06 |
| CorpFin v2 | Finance | 3 | 68.259% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 28 | 50.622% | 2026-05-04 |
| MortgageTax | Finance | 23 | 66.534% | 2026-05-28 |
| PRBench Finance | Finance | 8 | 46.51 | 2026-05-06 |
| TaxEval v2 | Finance | 23 | 74.202% | 2026-05-28 |
| React Native Evals | Frontend Development | 10 | 77.1795% overall | 2026-05-28 |
| MageBench Season 1 | Game | 8 | 1652 rating / 10 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 2 | 60.81 | 2026-05-06 |
| BenchLM | General Knowledge | 25 | 76 | 2026-05-06 |
| BenchLM | General Knowledge | 43 | 64 | 2026-05-06 |
| MedCode | Healthcare | 32 | 39.316% | 2026-05-28 |
| MedQA | Healthcare | 17 | 94.367% | 2026-04-16 |
| MedScribe | Healthcare | 35 | 76.442% | 2026-05-28 |
| HUMAINE | Human Preference | 23 | 3.55 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 14 | 118 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 34 | 46.81 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 93 | 37.27 | 2026-05-11 |
| GPQA Diamond | Intelligence | 29 | 84.091% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 26 | 29.4% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 121 | 12.3% | 2026-05-11 |
| LiveBench | Intelligence | 32 | 69.16 | 2026-05-05 |
| MathVision | Intelligence | 12 | 85 | 2026-05-06 |
| MathVision | Intelligence | 14 | 84.20 | 2026-05-06 |
| MMLU Pro | Intelligence | 30 | 85.914% | 2026-05-28 |
| MMMU Pro | Intelligence | 12 | 84.335% | 2026-05-28 |
| CaseLaw v2 | Legal | 28 | 58.735% | 2026-05-04 |
| Professional Reasoning Bench - Legal | Legal | 10 | 43.83 | 2026-05-06 |
| AA-LCR | Long Context | 2 | 0.70 | 2026-05-06 |
| LongVideoBench | Long Context | 1 | 0.80 | 2026-05-06 |
| AIME | Math | 10 | 95.625% | 2026-04-16 |
| LiveMathematicianBench | Math | 5 | 35.0% | 2026-05-28 |
| AIME 2026 | Mathematics | 2 | 95.83 | 2026-05-06 |
| HMMT 2025 | Mathematics | 6 | 0.95 | 2026-05-06 |
| HMMT February 2026 | Mathematics | 2 | 87.12 | 2026-05-06 |
| IMO-AnswerBench | Mathematics | 8 | 0.82 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 4 | 57.79 | 2026-05-06 |
| CharXiv-R | Multimodal | 14 | 0.78 | 2026-05-06 |
| Design Arena | Multimodal | 13 | 1302 | 2026-05-06 |
| InfoVQAtest | Multimodal | 1 | 0.93 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 15 | 1265.49 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 20 | 1255.33 | 2026-05-06 |
| LVBench | Multimodal | 1 | 0.76 | 2026-05-06 |
| MMVU | Multimodal | 1 | 0.80 | 2026-05-06 |
| SimpleVQA | Multimodal | 3 | 0.71 | 2026-05-06 |
| Video-MME v2 | Multimodal | 1 | 61.10 | 2026-05-06 |
| Video-MME v2 | Multimodal | 3 | 54.40 | 2026-05-06 |
| VideoMMMU | Multimodal | 3 | 0.87 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 20 | 41.86 | 2026-05-06 |
| ZEROBench | Multimodal | 2 | 0.11 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 162 | 33.33 | 2026-05-11 |
| Altered Riddles | Reasoning | 13 | 0.4319 | 2026-05-27 |
| Altered Riddles | Reasoning | 23 | 0.5374 | 2026-05-27 |
| CAIS Text Capabilities Index | Reasoning | 17 | 26.1 | 2026-05-27 |
| Context Arena | Reasoning | 17 | 59.22 | 2026-05-06 |
| Context Arena | Reasoning | 20 | 53.33 | 2026-05-06 |
| EnigmaEval | Reasoning | 20 | 3.38 | 2026-05-06 |
| FINAL Bench Metacognitive | Reasoning | 1 | 78.54 | 2026-05-06 |
| GPQA Diamond | Reasoning | 27 | 87.9% | 2026-05-11 |
| GPQA Diamond | Reasoning | 113 | 78.9% | 2026-05-11 |
| MultiNRC | Reasoning | 17 | 35.17 | 2026-05-06 |
| CAIS Risk Index | Safety | 33 | 61.5 | 2026-05-27 |
| InvisibleBench | Safety | 7 | 0.05 | 2026-05-06 |
| LiveSecBench | Safety | 7 | 74.79 | 2026-05-27 |
| CritPt | Science | 45 | 3.1% | 2026-05-11 |
| CritPt | Science | 112 | 0.6% | 2026-05-11 |
| DeepSearchQA | Search | 4 | 0.77 | 2026-05-06 |
| Seal-0 | Search | 1 | 0.57 | 2026-05-06 |
| WideSearch | Search | 2 | 0.79 | 2026-05-06 |
| SWE-bench Pro | Software Engineering | 2 | 50.70 | 2026-05-06 |
No matching rows.