GLM 5.1 | BenchmarkList

Metadata

GLM Open source

Aliases: glm-5.1, glm-5.1-20260406, z-ai-glm-5.1, z-ai-glm-5.1-20260406, z-ai/glm-5.1, z-ai/glm-5.1-20260406, GLM-5.1 Thinking, GLM 5.1 Thinking, glm-5.1-thinking, z-ai/glm-5.1-thinking

Benchmark	Category	Rank	Score	Sampled
AutoBench	Agentic	5	3.15	2026-05-06
CoWorkBench	Agentic	4	66%	2026-05-28
Gert Labs Rankings	Agentic	11	0.57	2026-05-11
HiL-Bench	Agentic	4	21%	2026-05-05
ITBench-AA	Agentic	5	40.3%	2026-05-28
MCP Atlas	Agentic	5	71.8%	2026-05-28
MCP Atlas	Agentic	2	75.60	2026-05-06
MCPMark	Agentic	2	57.5%	2026-05-28
PinchBench	Agentic	29	0.85	2026-05-06
QwenClawBench	Agentic	4	58.7%	2026-05-28
QwenWorldBench	Agentic	5	50.2%	2026-05-28
Tau2-Bench Telecom	Agentic	5	97.7%	2026-05-11
Tau2-Bench Telecom	Agentic	9	97.1%	2026-05-11
TAU3-Bench	Agentic	2	0.71	2026-05-06
Terminal-Bench Hard	Agentic	26	43.2%	2026-05-11
Terminal-Bench Hard	Agentic	57	35.6%	2026-05-11
TERMS-Bench	Agentic	2	68.6% SE+	2026-05-28
Toolathlon	Agentic	11	0.41	2026-05-06
Vending-Bench 2	Agentic	9	5634.41	2026-05-28
VitaBench	Agentic	3	45.1%	2026-05-28
YC-Bench	Agentic	1	1510772	2026-05-06
OpenUGI	Alignment	60	52.88	2026-05-06
OpenUGI	Alignment	383	39.99	2026-05-06
ALE-Bench	Coding	35	887.10	2026-05-06
Arena AI Code	Coding	5	1532	2026-05-06
BLXBench	Coding	22	13.90	2026-05-06
Claw-Eval	Coding	3	62.7%	2026-05-28
DeepSWE	Coding	10	17.48	2026-05-26
Kernel Bench L3	Coding	2	2.00/78%	2026-05-28
LiveCodeBench	Coding	39	81.38%	2026-05-28
LMArena WebDev Arena	Coding	5	1531.70	2026-05-06
NL2Repo	Coding	4	41%	2026-05-28
NL2Repo	Coding	1	0.43	2026-05-06
QwenSVG	Coding	2	1605	2026-05-28
QwenWebDev	Coding	4	1564	2026-05-28
SciCode	Coding	4	45.1%	2026-05-28
SciCode	Coding	50	43.8%	2026-05-11
SciCode	Coding	172	36.1%	2026-05-11
SkillsBench	Coding	3	53.1%	2026-05-28
SWE-bench Verified	Coding	13	76.4%	2026-05-28
Terminal-Bench 2.0	Coding	17	53.933%	2026-05-28
Terminal-Bench 2.0	Coding	5	63.5%	2026-05-28
Terminal-Bench 2.1	Coding	7	56.929%	2026-05-28
Vibe Code Bench v1.1	Coding	16	31.456%	2026-05-28
CyberGym	Cybersecurity	5	0.69	2026-05-06
ExploitBench v8-bench	Cybersecurity	13	2.62 points	2026-05-15
ExploitBench v8-bench	Cybersecurity	15	2.56 points	2026-05-15
AA-Omniscience	Factuality	12	1.93	2026-05-11
CorpFin v2	Finance	22	64.452%	2026-05-28
Finance Agent v1.1	Finance	10	57.655%	2026-05-04
Finance Agent v2	Finance	9	44.792%	2026-05-28
Rogo Big Finance Bench	Finance	4	55% rubric / 36% final	2026-05-28
TaxEval v2	Finance	56	71.194%	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	20	1136.32 Elo / 118 games	2026-05-28
InfiniteBM Liar's Dice	Game	15	1237.4 Elo / 1717 games	2026-05-28
BenchLM	General Knowledge	14	83	2026-05-06
MAXIFE	General Knowledge	4	87.7%	2026-05-28
MMLU-ProX	General Knowledge	5	83.9%	2026-05-28
MMLU-Redux	General Knowledge	6	94.3%	2026-05-28
NOVA-63	General Knowledge	5	54.6%	2026-05-28
LMArena Text Arena	Generalization	12	1467.75	2026-05-06
MedCode	Healthcare	22	41.604%	2026-05-28
MedScribe	Healthcare	46	72.27%	2026-05-28
IFBench	Instruction Following	3	76%	2026-05-28
IFEval	Instruction Following	1	94.5%	2026-05-28
AIIQ Composite IQ	Intelligence	17	115	2026-05-12
Artificial Analysis Intelligence Index	Intelligence	17	51.41	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	49	43.82	2026-05-11
GPQA Diamond	Intelligence	27	84.518%	2026-05-28
HLE w/ tools	Intelligence	4	52.3%	2026-05-28
Humanity's Last Exam	Intelligence	5	34.7%	2026-05-28
Humanity's Last Exam	Intelligence	32	28%	2026-05-11
Humanity's Last Exam	Intelligence	44	25.6%	2026-05-11
LiveBench	Intelligence	29	70.62	2026-05-05
MMLU Pro	Intelligence	22	86.9%	2026-05-28
MMLU-Pro	Intelligence	6	86.3%	2026-05-28
SuperGPQA	Intelligence	6	68%	2026-05-28
Vals Index	Intelligence	10	52.144%	2026-05-28
CaseLaw v2	Legal	48	51.554%	2026-05-04
LegalBench	Legal	15	84.394%	2026-05-28
MRCR-v2 128k	Long Context	6	62%	2026-05-28
AIME	Math	22	91.875%	2026-04-16
ProofBench	Math	12	22.222%	2026-05-28
HMMT 2025	Mathematics	9	0.94	2026-05-06
HMMT February 2026	Mathematics	5	89.4%	2026-05-28
IMO-AnswerBench	Mathematics	4	83.8%	2026-05-28
IMO-AnswerBench	Mathematics	5	0.84	2026-05-06
MathArena Apex	Mathematics	5	11.5%	2026-05-28
INCLUDE	Multilingual	5	84.3%	2026-05-28
MMMLU	Multilingual	6	87.2%	2026-05-28
Artificial Analysis Openness Index	Openness	88	44.44	2026-05-11
Altered Riddles	Reasoning	5	0.3239	2026-05-27
CAIS Text Capabilities Index	Reasoning	15	29.8	2026-05-27
Context Arena	Reasoning	15	62.05	2026-05-06
Context Arena	Reasoning	44	30.29	2026-05-06
Global PIQA	Reasoning	5	89.5%	2026-05-28
GPQA Diamond	Reasoning	6	86.2%	2026-05-28
GPQA Diamond	Reasoning	36	86.8%	2026-05-11
GPQA Diamond	Reasoning	68	83.9%	2026-05-11
CAIS Risk Index	Safety	17	50.3	2026-05-27
CritPt	Science	5	4.6%	2026-05-28
CritPt	Science	37	4.6%	2026-05-11
CritPt	Science	212	0%	2026-05-11
SWE-bench Pro	Software Engineering	4	58.8%	2026-05-28
SpreadsheetBench	Spreadsheets	3	85.2%	2026-05-28
Structured Output Benchmark	Structured Output	3	86.60	2026-05-06
LiveSQLBench	Text to SQL	6	35.29	2026-05-06
BFCL-V4	Tool Use	4	70.9%	2026-05-28
WMT24++	Translation	5	81.8%	2026-05-28

Metadata

Benchmark Results