WolfBench

Five-metric agent benchmark based on Terminal-Bench 2.0, comparing model and coding-agent combinations by consistency, average score, best run, ceiling, and worst run.

57rows
averageprimary metric
2026-05-06sampled

Metadata

Metrics

Average, Solid, Best, Ceiling, Worst

Latest Results

Rows ranked by highest average score.

Rank Subject Average Model Match Provenance Sampled
1 terminus-2 / GPT-5.5 77 Imported 2026-05-06
2 cursor-cli / GPT-5.5 77 Imported 2026-05-06
3 openclaw / Claude Opus 4.7 75 Imported 2026-05-06
4 hermes / GPT-5.5 74 Imported 2026-05-06
5 claude-code / Claude Opus 4.7 73 Imported 2026-05-06
6 terminus-2 / Claude Opus 4.6 71 Imported 2026-05-06
7 terminus-2 / Claude Opus 4.7 71 Imported 2026-05-06
8 openclaw / GPT-5.4 71 Imported 2026-05-06
9 openclaw / GPT-5.5 70 Imported 2026-05-06
10 terminus-2 / GPT-5.4 69 Imported 2026-05-06
11 hermes / Claude Opus 4.7 66 Imported 2026-05-06
12 hermes / GPT-5.4 66 Imported 2026-05-06
13 hermes / Claude Opus 4.6 64 Imported 2026-05-06
14 claude-code / Claude Opus 4.6 63 Imported 2026-05-06
15 cursor-cli / Claude Opus 4.6 63 Imported 2026-05-06
16 terminus-2 / Claude Sonnet 4.6 62 Imported 2026-05-06
17 openclaw / Kimi K2.6 [W&B] 60 Imported 2026-05-06
18 terminus-2 / Kimi K2.6 [Moonshot AI] 59 Imported 2026-05-06
19 openclaw / Kimi K2.6 [Moonshot AI] 59 Imported 2026-05-06
20 openclaw / Gemini 3.1 Pro Preview 59 Imported 2026-05-06
21 claude-code / Claude Sonnet 4.6 58 Imported 2026-05-06
22 openclaw / Claude Opus 4.6 58 Imported 2026-05-06
23 terminus-2 / Kimi K2.6 [W&B] 57 Imported 2026-05-06
24 openclaw / GPT-5.3-Codex 55 Imported 2026-05-06
25 openclaw / Claude Sonnet 4.6 53 Imported 2026-05-06
26 terminus-2 / Gemini 3.1 Pro Preview 52 Imported 2026-05-06
27 terminus-2 / MiniMax M2.7 52 Imported 2026-05-06
28 terminus-2 / Kimi K2.5 (int4) [W&B] 48 Imported 2026-05-06
29 terminus-2 / GLM-5-Turbo 48 Imported 2026-05-06
30 hermes / Kimi K2.6 [Moonshot AI] 47 Imported 2026-05-06
31 terminus-2 / GLM-5-FP8 [W&B] 47 Imported 2026-05-06
32 terminus-2 / MiniMax M2.5 [W&B] 47 Imported 2026-05-06
33 openclaw / GLM-5-Turbo 47 Imported 2026-05-06
34 terminus-2 / Kimi K2.5 (nvfp4) [W&B] 47 Imported 2026-05-06
35 openclaw / MiniMax M2.7 46 Imported 2026-05-06
36 terminus-2 / Gemini 3 Flash Preview 44 Imported 2026-05-06
37 terminus-2 / GLM-5.1 [W&B] 42 Imported 2026-05-06
38 openclaw / Gemini 3 Flash Preview 41 Imported 2026-05-06
39 hermes / Kimi K2.5 (nvfp4) [W&B] 41 Imported 2026-05-06
40 openclaw / Kimi K2.5 (int4) [W&B] 39 Imported 2026-05-06
41 terminus-2 / GPT-5.3-Codex 39 Imported 2026-05-06
42 openclaw / MiniMax M2.5 [W&B] 37 Imported 2026-05-06
43 openclaw / GLM-5-FP8 [W&B] 37 Imported 2026-05-06
44 openclaw / Kimi K2.5 (nvfp4) [W&B] 37 Imported 2026-05-06
45 terminus-2 / NVIDIA-Nemotron-3-Super-120B-A12B-FP8 [W&B] 36 Imported 2026-05-06
46 openclaw / GLM-5.1 [W&B] 33 Imported 2026-05-06
47 terminus-2 / Gemma 4 31B [W&B] 31 Imported 2026-05-06
48 terminus-2 / GPT‑5.4 mini 26 Imported 2026-05-06
49 terminus-2 / Gemini 3.1 Flash Lite Preview 25 Imported 2026-05-06
50 terminus-2 / Mistral Small 4 119B A6B 24 Imported 2026-05-06
51 openclaw / Gemini 3.1 Flash Lite Preview 23 Imported 2026-05-06
52 terminus-2 / GPT‑5.4 nano 22 Imported 2026-05-06
53 openclaw / NVIDIA-Nemotron-3-Super-120B-A12B-FP8 [W&B] 20 Imported 2026-05-06
54 openclaw / Gemma 4 31B [W&B] 18 Imported 2026-05-06
55 openclaw / Mistral Small 4 119B A6B 17 Imported 2026-05-06
56 openclaw / GPT‑5.4 mini 14 Imported 2026-05-06
57 openclaw / GPT‑5.4 nano 14 Imported 2026-05-06