WolfBench
Five-metric agent benchmark based on Terminal-Bench 2.0, comparing model and coding-agent combinations by consistency, average score, best run, ceiling, and worst run.
57rows
averageprimary metric
2026-05-06sampled
Metadata
Metrics
Average, Solid, Best, Ceiling, Worst
| Rank | Subject | Average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | terminus-2 / GPT-5.5 | 77 | — | Imported | 2026-05-06 |
| 2 | cursor-cli / GPT-5.5 | 77 | — | Imported | 2026-05-06 |
| 3 | openclaw / Claude Opus 4.7 | 75 | — | Imported | 2026-05-06 |
| 4 | hermes / GPT-5.5 | 74 | — | Imported | 2026-05-06 |
| 5 | claude-code / Claude Opus 4.7 | 73 | — | Imported | 2026-05-06 |
| 6 | terminus-2 / Claude Opus 4.6 | 71 | — | Imported | 2026-05-06 |
| 7 | terminus-2 / Claude Opus 4.7 | 71 | — | Imported | 2026-05-06 |
| 8 | openclaw / GPT-5.4 | 71 | — | Imported | 2026-05-06 |
| 9 | openclaw / GPT-5.5 | 70 | — | Imported | 2026-05-06 |
| 10 | terminus-2 / GPT-5.4 | 69 | — | Imported | 2026-05-06 |
| 11 | hermes / Claude Opus 4.7 | 66 | — | Imported | 2026-05-06 |
| 12 | hermes / GPT-5.4 | 66 | — | Imported | 2026-05-06 |
| 13 | hermes / Claude Opus 4.6 | 64 | — | Imported | 2026-05-06 |
| 14 | claude-code / Claude Opus 4.6 | 63 | — | Imported | 2026-05-06 |
| 15 | cursor-cli / Claude Opus 4.6 | 63 | — | Imported | 2026-05-06 |
| 16 | terminus-2 / Claude Sonnet 4.6 | 62 | — | Imported | 2026-05-06 |
| 17 | openclaw / Kimi K2.6 [W&B] | 60 | — | Imported | 2026-05-06 |
| 18 | terminus-2 / Kimi K2.6 [Moonshot AI] | 59 | — | Imported | 2026-05-06 |
| 19 | openclaw / Kimi K2.6 [Moonshot AI] | 59 | — | Imported | 2026-05-06 |
| 20 | openclaw / Gemini 3.1 Pro Preview | 59 | — | Imported | 2026-05-06 |
| 21 | claude-code / Claude Sonnet 4.6 | 58 | — | Imported | 2026-05-06 |
| 22 | openclaw / Claude Opus 4.6 | 58 | — | Imported | 2026-05-06 |
| 23 | terminus-2 / Kimi K2.6 [W&B] | 57 | — | Imported | 2026-05-06 |
| 24 | openclaw / GPT-5.3-Codex | 55 | — | Imported | 2026-05-06 |
| 25 | openclaw / Claude Sonnet 4.6 | 53 | — | Imported | 2026-05-06 |
| 26 | terminus-2 / Gemini 3.1 Pro Preview | 52 | — | Imported | 2026-05-06 |
| 27 | terminus-2 / MiniMax M2.7 | 52 | — | Imported | 2026-05-06 |
| 28 | terminus-2 / Kimi K2.5 (int4) [W&B] | 48 | — | Imported | 2026-05-06 |
| 29 | terminus-2 / GLM-5-Turbo | 48 | — | Imported | 2026-05-06 |
| 30 | hermes / Kimi K2.6 [Moonshot AI] | 47 | — | Imported | 2026-05-06 |
| 31 | terminus-2 / GLM-5-FP8 [W&B] | 47 | — | Imported | 2026-05-06 |
| 32 | terminus-2 / MiniMax M2.5 [W&B] | 47 | — | Imported | 2026-05-06 |
| 33 | openclaw / GLM-5-Turbo | 47 | — | Imported | 2026-05-06 |
| 34 | terminus-2 / Kimi K2.5 (nvfp4) [W&B] | 47 | — | Imported | 2026-05-06 |
| 35 | openclaw / MiniMax M2.7 | 46 | — | Imported | 2026-05-06 |
| 36 | terminus-2 / Gemini 3 Flash Preview | 44 | — | Imported | 2026-05-06 |
| 37 | terminus-2 / GLM-5.1 [W&B] | 42 | — | Imported | 2026-05-06 |
| 38 | openclaw / Gemini 3 Flash Preview | 41 | — | Imported | 2026-05-06 |
| 39 | hermes / Kimi K2.5 (nvfp4) [W&B] | 41 | — | Imported | 2026-05-06 |
| 40 | openclaw / Kimi K2.5 (int4) [W&B] | 39 | — | Imported | 2026-05-06 |
| 41 | terminus-2 / GPT-5.3-Codex | 39 | — | Imported | 2026-05-06 |
| 42 | openclaw / MiniMax M2.5 [W&B] | 37 | — | Imported | 2026-05-06 |
| 43 | openclaw / GLM-5-FP8 [W&B] | 37 | — | Imported | 2026-05-06 |
| 44 | openclaw / Kimi K2.5 (nvfp4) [W&B] | 37 | — | Imported | 2026-05-06 |
| 45 | terminus-2 / NVIDIA-Nemotron-3-Super-120B-A12B-FP8 [W&B] | 36 | — | Imported | 2026-05-06 |
| 46 | openclaw / GLM-5.1 [W&B] | 33 | — | Imported | 2026-05-06 |
| 47 | terminus-2 / Gemma 4 31B [W&B] | 31 | — | Imported | 2026-05-06 |
| 48 | terminus-2 / GPT‑5.4 mini | 26 | — | Imported | 2026-05-06 |
| 49 | terminus-2 / Gemini 3.1 Flash Lite Preview | 25 | — | Imported | 2026-05-06 |
| 50 | terminus-2 / Mistral Small 4 119B A6B | 24 | — | Imported | 2026-05-06 |
| 51 | openclaw / Gemini 3.1 Flash Lite Preview | 23 | — | Imported | 2026-05-06 |
| 52 | terminus-2 / GPT‑5.4 nano | 22 | — | Imported | 2026-05-06 |
| 53 | openclaw / NVIDIA-Nemotron-3-Super-120B-A12B-FP8 [W&B] | 20 | — | Imported | 2026-05-06 |
| 54 | openclaw / Gemma 4 31B [W&B] | 18 | — | Imported | 2026-05-06 |
| 55 | openclaw / Mistral Small 4 119B A6B | 17 | — | Imported | 2026-05-06 |
| 56 | openclaw / GPT‑5.4 mini | 14 | — | Imported | 2026-05-06 |
| 57 | openclaw / GPT‑5.4 nano | 14 | — | Imported | 2026-05-06 |
No matching rows.