ClawProBench
OpenClaw agent benchmark measuring model performance on reasoning, planning, tool use, reliability, efficiency, and safety across repeated runs.
61rows
final_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Final Score, Pass^3, Pass@3, Avg Score, Capability, Efficiency, Planning, Safety, Tool Use, Constraints, Error Recovery, Synthesis, Avg Runtime (lower is better), Total Tokens (lower is better), Cost (lower is better)
| Rank | Subject | Final Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-5.5-xhigh | 67.90 | — | Imported | 2026-05-06 |
| 2 | deepseek-v4-pro | 64.38 | — | Imported | 2026-05-06 |
| 3 | qwen3.5-plus | 64.19 | — | Imported | 2026-05-06 |
| 4 | qwen3.5-397b-a17b | 64.18 | — | Imported | 2026-05-06 |
| 5 | mimo-v2.5-pro | 63.30 | — | Imported | 2026-05-06 |
| 6 | GLM-5.1 | 62.93 | — | Imported | 2026-05-06 |
| 7 | doubao-seed-2.0-code | 62.36 | — | Imported | 2026-05-06 |
| 8 | GLM-5-Turbo | 61.92 | — | Imported | 2026-05-06 |
| 9 | deepseek-v4-flash | 61.47 | — | Imported | 2026-05-06 |
| 10 | doubao-seed-2.0-pro | 61.07 | — | Imported | 2026-05-06 |
| 11 | Claude Sonnet 4.6 | 60.50 | — | Imported | 2026-05-06 |
| 12 | doubao-seed-2.0-lite | 60.40 | — | Imported | 2026-05-06 |
| 13 | mimo-v2.5 | 60.39 | — | Imported | 2026-05-06 |
| 14 | qwen3.6-plus | 60.20 | — | Imported | 2026-05-06 |
| 15 | DeepSeek-V3.2 | 60.13 | — | Imported | 2026-05-06 |
| 16 | DeepSeek-V3.2 | 60.12 | — | Imported | 2026-05-06 |
| 17 | huanyuan-3.0-preview | 59.39 | — | Imported | 2026-05-06 |
| 18 | kimi-k2.6 | 59.31 | — | Imported | 2026-05-06 |
| 19 | doubao-seed-code | 59.22 | — | Imported | 2026-05-06 |
| 20 | qwen3.6-plus | 59.05 | — | Imported | 2026-05-06 |
| 21 | LongCat-2.0-Preview | 58.80 | — | Imported | 2026-05-06 |
| 22 | qwen3.6-27b | 58.74 | — | Imported | 2026-05-06 |
| 23 | kimi-k2.5 | 58.49 | — | Imported | 2026-05-06 |
| 24 | DeepSeekV3.2 | 57.94 | — | Imported | 2026-05-06 |
| 25 | mimo-v2-pro | 57.92 | — | Imported | 2026-05-06 |
| 26 | mimo-v2-omni | 57.65 | — | Imported | 2026-05-06 |
| 27 | LongCat-Flash-Thinking-2601 | 57.48 | — | Imported | 2026-05-06 |
| 28 | Ling-2.6-1T | 57.40 | — | Imported | 2026-05-06 |
| 29 | qwen3.6-max-preview | 57.40 | — | Imported | 2026-05-06 |
| 30 | kimi-k2.6-code-preview | 57.14 | — | Imported | 2026-05-06 |
| 31 | GLM-5 | 57.05 | — | Imported | 2026-05-06 |
| 32 | qwen3.6-35b-a3b | 56.94 | — | Imported | 2026-05-06 |
| 33 | gpt-5.4 | 56.73 | — | Imported | 2026-05-06 |
| 34 | qwen3.6-flash | 56.55 | — | Imported | 2026-05-06 |
| 35 | GLM-4.6 | 56.29 | — | Imported | 2026-05-06 |
| 36 | qwen3-max-2026-01-23 | 55.76 | — | Imported | 2026-05-06 |
| 37 | kat-coder-pro-v2 | 54.74 | — | Imported | 2026-05-06 |
| 38 | GLM-4.7 | 54.58 | — | Imported | 2026-05-06 |
| 39 | gemini-3.1-pro-preview | 53.95 | — | Imported | 2026-05-06 |
| 40 | hunyuan-2.0-thinking | 52.69 | — | Imported | 2026-05-06 |
| 41 | MiniMax-M2.5 | 51.79 | — | Imported | 2026-05-06 |
| 42 | gemma-4-31b-it | 51.59 | — | Imported | 2026-05-06 |
| 43 | Ling-2.5-1T | 51.19 | — | Imported | 2026-05-06 |
| 44 | DeepSeek-R1 | 50.23 | — | Imported | 2026-05-06 |
| 45 | MiniMax-M2.7 | 49.53 | — | Imported | 2026-05-06 |
| 46 | kimi-for-coding-k2.6 | 49.04 | — | Imported | 2026-05-06 |
| 47 | gemini-3-flash-preview | 48.99 | — | Imported | 2026-05-06 |
| 48 | MiniMax-M2.1 | 48.08 | — | Imported | 2026-05-06 |
| 49 | Kimi-K2-Thinking | 47.83 | — | Imported | 2026-05-06 |
| 50 | hunyuan-2.0-instruct | 46.94 | — | Imported | 2026-05-06 |
| 51 | qwen3-coder-next | 46.84 | — | Imported | 2026-05-06 |
| 52 | mistral-small-2603 | 45.26 | — | Imported | 2026-05-06 |
| 53 | grok-4.20 | 43.04 | — | Imported | 2026-05-06 |
| 54 | kimi-for-coding-k2.5 | 42.72 | — | Imported | 2026-05-06 |
| 55 | step-3.5-flash-2603 | 42.59 | — | Imported | 2026-05-06 |
| 56 | step-3.5-flash | 41.75 | — | Imported | 2026-05-06 |
| 57 | Spark X2 | 41.44 | — | Imported | 2026-05-06 |
| 58 | step-3.5-flash | 38.74 | — | Imported | 2026-05-06 |
| 59 | hunyuan-t1 | 34.74 | — | Imported | 2026-05-06 |
| 60 | ERNIE-4.5-Turbo | 33.68 | — | Imported | 2026-05-06 |
| 61 | Ling-2.6-Flash | 27.04 | — | Imported | 2026-05-06 |
No matching rows.