ZClawBench

ZClawBench evaluates Claw-style agent task execution quality, measuring a model's ability to autonomously complete complex multi-step coding tasks in real-world environments.

3rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 GLM-5V-Turbo 0.58 GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Self-reported 2026-05-06
2 Qwen3.6-27B 0.53 Qwen3.6 27B
qwen-qwen3.6-27b
Self-reported 2026-05-06
3 Qwen3.6-35B-A3B 0.53 Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Self-reported 2026-05-06