GLM 5.1
GLM / Z.ai
109scores
91benchmarks
$1.05 / $3.5 per 1M tokenscost in/out
Metadata
GLM Open source
Aliases: glm-5.1, glm-5.1-20260406, z-ai-glm-5.1, z-ai-glm-5.1-20260406, z-ai/glm-5.1, z-ai/glm-5.1-20260406, GLM-5.1 Thinking, GLM 5.1 Thinking, glm-5.1-thinking, z-ai/glm-5.1-thinking
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AutoBench | Agentic | 5 | 3.15 | 2026-05-06 |
| CoWorkBench | Agentic | 4 | 66% | 2026-05-28 |
| Gert Labs Rankings | Agentic | 11 | 0.57 | 2026-05-11 |
| HiL-Bench | Agentic | 4 | 21% | 2026-05-05 |
| ITBench-AA | Agentic | 5 | 40.3% | 2026-05-28 |
| MCP Atlas | Agentic | 5 | 71.8% | 2026-05-28 |
| MCP Atlas | Agentic | 2 | 75.60 | 2026-05-06 |
| MCPMark | Agentic | 2 | 57.5% | 2026-05-28 |
| PinchBench | Agentic | 29 | 0.85 | 2026-05-06 |
| QwenClawBench | Agentic | 4 | 58.7% | 2026-05-28 |
| QwenWorldBench | Agentic | 5 | 50.2% | 2026-05-28 |
| Tau2-Bench Telecom | Agentic | 5 | 97.7% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 9 | 97.1% | 2026-05-11 |
| TAU3-Bench | Agentic | 2 | 0.71 | 2026-05-06 |
| Terminal-Bench Hard | Agentic | 26 | 43.2% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 57 | 35.6% | 2026-05-11 |
| TERMS-Bench | Agentic | 2 | 68.6% SE+ | 2026-05-28 |
| Toolathlon | Agentic | 11 | 0.41 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 9 | 5634.41 | 2026-05-28 |
| VitaBench | Agentic | 3 | 45.1% | 2026-05-28 |
| YC-Bench | Agentic | 1 | 1510772 | 2026-05-06 |
| OpenUGI | Alignment | 60 | 52.88 | 2026-05-06 |
| OpenUGI | Alignment | 383 | 39.99 | 2026-05-06 |
| ALE-Bench | Coding | 35 | 887.10 | 2026-05-06 |
| Arena AI Code | Coding | 5 | 1532 | 2026-05-06 |
| BLXBench | Coding | 22 | 13.90 | 2026-05-06 |
| Claw-Eval | Coding | 3 | 62.7% | 2026-05-28 |
| DeepSWE | Coding | 10 | 17.48 | 2026-05-26 |
| Kernel Bench L3 | Coding | 2 | 2.00/78% | 2026-05-28 |
| LiveCodeBench | Coding | 39 | 81.38% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 5 | 1531.70 | 2026-05-06 |
| NL2Repo | Coding | 4 | 41% | 2026-05-28 |
| NL2Repo | Coding | 1 | 0.43 | 2026-05-06 |
| QwenSVG | Coding | 2 | 1605 | 2026-05-28 |
| QwenWebDev | Coding | 4 | 1564 | 2026-05-28 |
| SciCode | Coding | 4 | 45.1% | 2026-05-28 |
| SciCode | Coding | 50 | 43.8% | 2026-05-11 |
| SciCode | Coding | 172 | 36.1% | 2026-05-11 |
| SkillsBench | Coding | 3 | 53.1% | 2026-05-28 |
| SWE-bench Verified | Coding | 13 | 76.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 17 | 53.933% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 5 | 63.5% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 7 | 56.929% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 16 | 31.456% | 2026-05-28 |
| CyberGym | Cybersecurity | 5 | 0.69 | 2026-05-06 |
| ExploitBench v8-bench | Cybersecurity | 13 | 2.62 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 15 | 2.56 points | 2026-05-15 |
| AA-Omniscience | Factuality | 12 | 1.93 | 2026-05-11 |
| CorpFin v2 | Finance | 22 | 64.452% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 10 | 57.655% | 2026-05-04 |
| Finance Agent v2 | Finance | 9 | 44.792% | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 4 | 55% rubric / 36% final | 2026-05-28 |
| TaxEval v2 | Finance | 56 | 71.194% | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 20 | 1136.32 Elo / 118 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 15 | 1237.4 Elo / 1717 games | 2026-05-28 |
| BenchLM | General Knowledge | 14 | 83 | 2026-05-06 |
| MAXIFE | General Knowledge | 4 | 87.7% | 2026-05-28 |
| MMLU-ProX | General Knowledge | 5 | 83.9% | 2026-05-28 |
| MMLU-Redux | General Knowledge | 6 | 94.3% | 2026-05-28 |
| NOVA-63 | General Knowledge | 5 | 54.6% | 2026-05-28 |
| LMArena Text Arena | Generalization | 12 | 1467.75 | 2026-05-06 |
| MedCode | Healthcare | 22 | 41.604% | 2026-05-28 |
| MedScribe | Healthcare | 46 | 72.27% | 2026-05-28 |
| IFBench | Instruction Following | 3 | 76% | 2026-05-28 |
| IFEval | Instruction Following | 1 | 94.5% | 2026-05-28 |
| AIIQ Composite IQ | Intelligence | 17 | 115 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 17 | 51.41 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 49 | 43.82 | 2026-05-11 |
| GPQA Diamond | Intelligence | 27 | 84.518% | 2026-05-28 |
| HLE w/ tools | Intelligence | 4 | 52.3% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 5 | 34.7% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 32 | 28% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 44 | 25.6% | 2026-05-11 |
| LiveBench | Intelligence | 29 | 70.62 | 2026-05-05 |
| MMLU Pro | Intelligence | 22 | 86.9% | 2026-05-28 |
| MMLU-Pro | Intelligence | 6 | 86.3% | 2026-05-28 |
| SuperGPQA | Intelligence | 6 | 68% | 2026-05-28 |
| Vals Index | Intelligence | 10 | 52.144% | 2026-05-28 |
| CaseLaw v2 | Legal | 48 | 51.554% | 2026-05-04 |
| LegalBench | Legal | 15 | 84.394% | 2026-05-28 |
| MRCR-v2 128k | Long Context | 6 | 62% | 2026-05-28 |
| AIME | Math | 22 | 91.875% | 2026-04-16 |
| ProofBench | Math | 12 | 22.222% | 2026-05-28 |
| HMMT 2025 | Mathematics | 9 | 0.94 | 2026-05-06 |
| HMMT February 2026 | Mathematics | 5 | 89.4% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 4 | 83.8% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 5 | 0.84 | 2026-05-06 |
| MathArena Apex | Mathematics | 5 | 11.5% | 2026-05-28 |
| INCLUDE | Multilingual | 5 | 84.3% | 2026-05-28 |
| MMMLU | Multilingual | 6 | 87.2% | 2026-05-28 |
| Artificial Analysis Openness Index | Openness | 88 | 44.44 | 2026-05-11 |
| Altered Riddles | Reasoning | 5 | 0.3239 | 2026-05-27 |
| CAIS Text Capabilities Index | Reasoning | 15 | 29.8 | 2026-05-27 |
| Context Arena | Reasoning | 15 | 62.05 | 2026-05-06 |
| Context Arena | Reasoning | 44 | 30.29 | 2026-05-06 |
| Global PIQA | Reasoning | 5 | 89.5% | 2026-05-28 |
| GPQA Diamond | Reasoning | 6 | 86.2% | 2026-05-28 |
| GPQA Diamond | Reasoning | 36 | 86.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 68 | 83.9% | 2026-05-11 |
| CAIS Risk Index | Safety | 17 | 50.3 | 2026-05-27 |
| CritPt | Science | 5 | 4.6% | 2026-05-28 |
| CritPt | Science | 37 | 4.6% | 2026-05-11 |
| CritPt | Science | 212 | 0% | 2026-05-11 |
| SWE-bench Pro | Software Engineering | 4 | 58.8% | 2026-05-28 |
| SpreadsheetBench | Spreadsheets | 3 | 85.2% | 2026-05-28 |
| Structured Output Benchmark | Structured Output | 3 | 86.60 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 6 | 35.29 | 2026-05-06 |
| BFCL-V4 | Tool Use | 4 | 70.9% | 2026-05-28 |
| WMT24++ | Translation | 5 | 81.8% | 2026-05-28 |
No matching rows.