MoonshotAI: Kimi K2.6
Kimi / Moonshot AI
122scores
104benchmarks
$0.74 / $3.49 per 1M tokenscost in/out
Metadata
Kimi Closed/API
Aliases: kimi-k2.6, kimi-k2.6-20260420, moonshotai-kimi-k2.6, moonshotai-kimi-k2.6-20260420, moonshotai/kimi-k2.6, moonshotai/kimi-k2.6-20260420, K2.6 Thinking, Kimi K2.6 Thinking, kimi-k2.6-thinking, moonshotai/kimi-k2.6-thinking
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| CoWorkBench | Agentic | 6 | 58.2% | 2026-05-28 |
| Gert Labs Rankings | Agentic | 5 | 0.66 | 2026-05-11 |
| HiL-Bench | Agentic | 6 | 14.67% | 2026-05-05 |
| ITBench-AA | Agentic | 16 | 31.2% | 2026-05-28 |
| MCP Atlas | Agentic | 6 | 66.6% | 2026-05-28 |
| MCPMark | Agentic | 5 | 55.9% | 2026-05-28 |
| OSWorld | Agentic | 8 | 73.06% | 2026-05-27 |
| OSWorld-Verified | Agentic | 5 | 0.73 | 2026-05-06 |
| QwenClawBench | Agentic | 6 | 54.7% | 2026-05-28 |
| QwenWorldBench | Agentic | 4 | 50.9% | 2026-05-28 |
| Tau2-Bench Telecom | Agentic | 14 | 95.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 32 | 93.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 22 | 43.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 49 | 37.9% | 2026-05-11 |
| TERMS-Bench | Agentic | 10 | 59.7% SE+ | 2026-05-28 |
| Toolathlon | Agentic | 4 | 0.50 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 5 | 6204.57 | 2026-05-28 |
| VitaBench | Agentic | 5 | 39.1% | 2026-05-28 |
| YC-Bench | Agentic | 4 | 511137 | 2026-05-06 |
| OpenUGI | Alignment | 90 | 51.08 | 2026-05-06 |
| OpenUGI | Alignment | 529 | 36.07 | 2026-05-06 |
| ALE-Bench | Coding | 22 | 1092.67 | 2026-05-06 |
| Arena AI Code | Coding | 7 | 1525 | 2026-05-06 |
| BLXBench | Coding | 20 | 15.40 | 2026-05-06 |
| Claw-Eval | Coding | 4 | 61.5% | 2026-05-28 |
| Claw-Eval | Coding | 1 | 0.81 | 2026-05-06 |
| DeepSWE | Coding | 8 | 23.89 | 2026-05-26 |
| Kernel Bench L3 | Coding | 4 | 1.41/80% | 2026-05-28 |
| LiveCodeBench | Coding | 3 | 89.6% | 2026-05-28 |
| LiveCodeBench | Coding | 8 | 86.771% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 7 | 1524.58 | 2026-05-06 |
| NL2Repo | Coding | 3 | 42.8% | 2026-05-28 |
| QwenSVG | Coding | 6 | 1325 | 2026-05-28 |
| SciCode | Coding | 2 | 52.2% | 2026-05-28 |
| SciCode | Coding | 9 | 53.5% | 2026-05-11 |
| SciCode | Coding | 109 | 39.5% | 2026-05-11 |
| SkillsBench | Coding | 2 | 56.2% | 2026-05-28 |
| SWE-bench Verified | Coding | 14 | 76.2% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 13 | 57.303% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 3 | 66.7% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 9 | 53.558% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 14 | 37.891% | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 12 | 2.63 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 14 | 2.44 points | 2026-05-15 |
| Arena AI Document | Document AI | 10 | 1457 | 2026-05-06 |
| SAGE | Education | 9 | 50.224% | 2026-05-28 |
| AA-Omniscience | Factuality | 8 | 6.42 | 2026-05-11 |
| CorpFin v2 | Finance | 7 | 66.744% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 12 | 57.056% | 2026-05-04 |
| Finance Agent v2 | Finance | 8 | 44.866% | 2026-05-28 |
| MortgageTax | Finance | 25 | 65.818% | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 6 | 45% rubric / 27% final | 2026-05-28 |
| TaxEval v2 | Finance | 17 | 74.652% | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 28 | 1013.57 Elo / 116 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 30 | 1036.29 Elo / 1715 games | 2026-05-28 |
| BenchLM | General Knowledge | 12 | 85 | 2026-05-06 |
| MAXIFE | General Knowledge | 5 | 87.7% | 2026-05-28 |
| MMLU-ProX | General Knowledge | 6 | 83.7% | 2026-05-28 |
| MMLU-Redux | General Knowledge | 1 | 95.3% | 2026-05-28 |
| NOVA-63 | General Knowledge | 4 | 56.7% | 2026-05-28 |
| LMArena Text Arena | Generalization | 18 | 1454.64 | 2026-05-06 |
| MedCode | Healthcare | 31 | 40.142% | 2026-05-28 |
| MedScribe | Healthcare | 26 | 78.149% | 2026-05-28 |
| PhysicianBench | Healthcare | 7 | 17.0 +/- 2.6 | 2026-05-27 |
| IFBench | Instruction Following | 4 | 76% | 2026-05-28 |
| IFEval | Instruction Following | 2 | 94.5% | 2026-05-28 |
| AIIQ Composite IQ | Intelligence | 11 | 122 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 7 | 53.9 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 55 | 42.95 | 2026-05-11 |
| GPQA Diamond | Intelligence | 14 | 89.142% | 2026-05-28 |
| HLE w/ tools | Intelligence | 1 | 54% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 4 | 36.4% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 12 | 35.9% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 81 | 18.2% | 2026-05-11 |
| LiveBench | Intelligence | 23 | 72.39 | 2026-05-05 |
| MathVision | Intelligence | 3 | 93.20 | 2026-05-06 |
| MathVision | Intelligence | 7 | 87.40 | 2026-05-06 |
| MMLU Pro | Intelligence | 12 | 87.572% | 2026-05-28 |
| MMLU-Pro | Intelligence | 5 | 87.1% | 2026-05-28 |
| MMMU Pro | Intelligence | 10 | 86.301% | 2026-05-28 |
| SuperGPQA | Intelligence | 4 | 71.3% | 2026-05-28 |
| Vals Index | Intelligence | 8 | 55.551% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 6 | 56.788% | 2026-05-28 |
| CaseLaw v2 | Legal | 22 | 61.201% | 2026-05-04 |
| LegalBench | Legal | 12 | 84.738% | 2026-05-28 |
| MRCR-v2 128k | Long Context | 5 | 63.1% | 2026-05-28 |
| ProofBench | Math | 18 | 16% | 2026-05-28 |
| HMMT February 2026 | Mathematics | 4 | 92.7% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 3 | 86% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 3 | 0.86 | 2026-05-06 |
| MathArena Apex | Mathematics | 4 | 24% | 2026-05-28 |
| INCLUDE | Multilingual | 6 | 84.2% | 2026-05-28 |
| MMMLU | Multilingual | 5 | 87.5% | 2026-05-28 |
| BabyVision | Multimodal | 1 | 0.69 | 2026-05-06 |
| Blueprint-Bench 2 | Multimodal | 9 | 0.557 +/- 0.015 | 2026-05-28 |
| CharXiv-R | Multimodal | 3 | 0.87 | 2026-05-06 |
| Design Arena | Multimodal | 4 | 1342 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 11 | 1278.42 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 163 | 33.33 | 2026-05-11 |
| Altered Riddles | Reasoning | 14 | 0.4413 | 2026-05-27 |
| CAIS Text Capabilities Index | Reasoning | 14 | 31.4 | 2026-05-27 |
| Context Arena | Reasoning | 13 | 64.63 | 2026-05-06 |
| Context Arena | Reasoning | 21 | 51.88 | 2026-05-06 |
| Global PIQA | Reasoning | 6 | 89.2% | 2026-05-28 |
| GPQA Diamond | Reasoning | 3 | 90.5% | 2026-05-28 |
| GPQA Diamond | Reasoning | 9 | 91.1% | 2026-05-11 |
| GPQA Diamond | Reasoning | 114 | 78.8% | 2026-05-11 |
| OJBench | Reasoning | 1 | 0.61 | 2026-05-06 |
| CAIS Risk Index | Safety | 35 | 63.0 | 2026-05-27 |
| CritPt | Science | 4 | 8% | 2026-05-28 |
| CritPt | Science | 23 | 8% | 2026-05-11 |
| CritPt | Science | 75 | 1.4% | 2026-05-11 |
| DeepSearchQA | Search | 3 | 0.83 | 2026-05-06 |
| WideSearch | Search | 1 | 0.81 | 2026-05-06 |
| SWE-bench Multilingual | Software Engineering | 3 | 76.7% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 2 | 59.5% | 2026-05-28 |
| SWE-bench Verified | Software Engineering | 4 | 80.2% | 2026-05-28 |
| SpreadsheetBench | Spreadsheets | 5 | 84.5% | 2026-05-28 |
| LiveSQLBench | Text to SQL | 5 | 36.43 | 2026-05-06 |
| BFCL-V4 | Tool Use | 3 | 71.3% | 2026-05-28 |
| WMT24++ | Translation | 6 | 81.6% | 2026-05-28 |
| CAIS Vision Capabilities Index | Vision | 4 | 61.2 | 2026-05-27 |
No matching rows.