MoonshotAI: Kimi K2 0711
Kimi / Moonshot AI
53scores
53benchmarks
$0.57 / $2.3 per 1M tokenscost in/out
Metadata
Kimi Closed/API
Aliases: kimi-k2, moonshotai-kimi-k2, moonshotai/kimi-k2
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ADBench | Agentic | 7 | 79 | 2026-05-06 |
| Berkeley Function-Calling Leaderboard | Agentic | 11 | 59.06% | 2026-05-27 |
| Galileo Agent Leaderboard | Agentic | 5 | 0.53 | 2026-05-06 |
| LLM-WikiRace | Agentic | 11 | 45.30 | 2026-05-06 |
| Tau2 Airline | Agentic | 13 | 0.56 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 159 | 61.1% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 173 | 15.9% | 2026-05-11 |
| OpenUGI | Alignment | 30 | 56.55 | 2026-05-06 |
| IOI | Coding | 51 | 1.25% | 2026-05-26 |
| LiveCodeBench | Coding | 61 | 70.449% | 2026-05-28 |
| MultiPL-E | Coding | 4 | 0.857 | 2026-05-27 |
| SciCode | Coding | 200 | 34.5% | 2026-05-11 |
| Terminal-Bench 2.0 | Coding | 48 | 25.843% | 2026-05-28 |
| NeoEvalPlusN | Creative | 64 | 15.50 | 2026-05-06 |
| kluster.ai LLM Hallucination Detection Leaderboard | Factuality | 8 | 97.03 | 2026-05-06 |
| CorpFin v2 | Finance | 84 | 50.388% | 2026-05-28 |
| FinanceArena | Finance | 15 | 33.8 | 2026-05-27 |
| PRBench Finance | Finance | 16 | 38.34 | 2026-05-06 |
| TaxEval v2 | Finance | 65 | 70.196% | 2026-05-28 |
| BenchLM | General Knowledge | 75 | 42 | 2026-05-06 |
| CSimpleQA | General Knowledge | 5 | 0.78 | 2026-05-06 |
| MMLU-Redux | General Knowledge | 13 | 0.93 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 31 | 0.741131 | 2026-05-28 |
| MedQA | Healthcare | 62 | 83.975% | 2026-04-16 |
| HUMAINE | Human Preference | 6 | 3.71 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 31 | 101 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 179 | 26.32 | 2026-05-11 |
| GPQA Diamond | Intelligence | 65 | 71.464% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 213 | 7% | 2026-05-11 |
| MMLU Pro | Intelligence | 69 | 79.394% | 2026-05-28 |
| MMLU-Pro | Intelligence | 68 | 82.4% | 2026-05-11 |
| LegalBench | Legal | 49 | 81.454% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 23 | 36.38 | 2026-05-06 |
| Fiction.LiveBench | Long Context | 16 | 40.60 | 2026-05-06 |
| AIME | Math | 56 | 62.708% | 2026-04-16 |
| AIME 2025 | Math | 124 | 57% | 2026-05-11 |
| IneqMath | Math | 20 | 9 | 2026-05-06 |
| MATH 500 | Math | 13 | 94.2% | 2026-01-09 |
| MGSM | Math | 31 | 90.946% | 2026-01-09 |
| CNMO 2024 | Mathematics | 1 | 0.74 | 2026-05-06 |
| HMMT 2025 | Mathematics | 28 | 0.39 | 2026-05-06 |
| MATH-500 | Mathematics | 6 | 0.97 | 2026-05-06 |
| PolyMath-en | Mathematics | 1 | 0.65 | 2026-05-06 |
| LiveMedBench | Medical | 29 | 0.0585 | 2026-05-27 |
| Artificial Analysis Openness Index | Openness | 89 | 44.44 | 2026-05-11 |
| AutoLogi | Reasoning | 1 | 0.90 | 2026-05-06 |
| GPQA Diamond | Reasoning | 141 | 76.6% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 45 | 4.68 | 2026-05-06 |
| MultiNRC | Reasoning | 31 | 18.48 | 2026-05-06 |
| OJBench | Reasoning | 8 | 0.27 | 2026-05-06 |
| CritPt | Science | 262 | 0% | 2026-05-11 |
| SWE-bench Pro | Software Engineering | 6 | 27.67 | 2026-05-06 |
| ACEBench | Tool Use | 1 | 0.77 | 2026-05-06 |
No matching rows.