MoonshotAI: Kimi K2 0711

Metadata

Kimi Closed/API

Aliases: kimi-k2, moonshotai-kimi-k2, moonshotai/kimi-k2

Benchmark	Category	Rank	Score	Sampled
ADBench	Agentic	7	79	2026-05-06
Berkeley Function-Calling Leaderboard	Agentic	11	59.06%	2026-05-27
Galileo Agent Leaderboard	Agentic	5	0.53	2026-05-06
LLM-WikiRace	Agentic	11	45.30	2026-05-06
Tau2 Airline	Agentic	13	0.56	2026-05-06
Tau2-Bench Telecom	Agentic	159	61.1%	2026-05-11
Terminal-Bench Hard	Agentic	173	15.9%	2026-05-11
OpenUGI	Alignment	30	56.55	2026-05-06
IOI	Coding	51	1.25%	2026-05-26
LiveCodeBench	Coding	61	70.449%	2026-05-28
MultiPL-E	Coding	4	0.857	2026-05-27
SciCode	Coding	200	34.5%	2026-05-11
Terminal-Bench 2.0	Coding	48	25.843%	2026-05-28
NeoEvalPlusN	Creative	64	15.50	2026-05-06
kluster.ai LLM Hallucination Detection Leaderboard	Factuality	8	97.03	2026-05-06
CorpFin v2	Finance	84	50.388%	2026-05-28
FinanceArena	Finance	15	33.8	2026-05-27
PRBench Finance	Finance	16	38.34	2026-05-06
TaxEval v2	Finance	65	70.196%	2026-05-28
BenchLM	General Knowledge	75	42	2026-05-06
CSimpleQA	General Knowledge	5	0.78	2026-05-06
MMLU-Redux	General Knowledge	13	0.93	2026-05-06
HELM AIR-Bench	Generalization	31	0.741131	2026-05-28
MedQA	Healthcare	62	83.975%	2026-04-16
HUMAINE	Human Preference	6	3.71	2026-05-06
AIIQ Composite IQ	Intelligence	31	101	2026-05-12
Artificial Analysis Intelligence Index	Intelligence	179	26.32	2026-05-11
GPQA Diamond	Intelligence	65	71.464%	2026-05-28
Humanity's Last Exam	Intelligence	213	7%	2026-05-11
MMLU Pro	Intelligence	69	79.394%	2026-05-28
MMLU-Pro	Intelligence	68	82.4%	2026-05-11
LegalBench	Legal	49	81.454%	2026-05-28
Professional Reasoning Bench - Legal	Legal	23	36.38	2026-05-06
Fiction.LiveBench	Long Context	16	40.60	2026-05-06
AIME	Math	56	62.708%	2026-04-16
AIME 2025	Math	124	57%	2026-05-11
IneqMath	Math	20	9	2026-05-06
MATH 500	Math	13	94.2%	2026-01-09
MGSM	Math	31	90.946%	2026-01-09
CNMO 2024	Mathematics	1	0.74	2026-05-06
HMMT 2025	Mathematics	28	0.39	2026-05-06
MATH-500	Mathematics	6	0.97	2026-05-06
PolyMath-en	Mathematics	1	0.65	2026-05-06
LiveMedBench	Medical	29	0.0585	2026-05-27
Artificial Analysis Openness Index	Openness	89	44.44	2026-05-11
AutoLogi	Reasoning	1	0.90	2026-05-06
GPQA Diamond	Reasoning	141	76.6%	2026-05-11
Humanity's Last Exam (Text Only)	Reasoning	45	4.68	2026-05-06
MultiNRC	Reasoning	31	18.48	2026-05-06
OJBench	Reasoning	8	0.27	2026-05-06
CritPt	Science	262	0%	2026-05-11
SWE-bench Pro	Software Engineering	6	27.67	2026-05-06
ACEBench	Tool Use	1	0.77	2026-05-06

Metadata

Benchmark Results