PinchBench

Real-world OpenClaw agent benchmark evaluating how LLMs perform as the model inside an agent across practical coding, scheduling, research, email, and file-management workflows.

68rows
best_score_percentageprimary metric
2026-05-06sampled

Metadata

Metrics

Best Score, Average Score, Average Execution Time (lower is better), Best Execution Time (lower is better), Average Cost (lower is better), Best Cost (lower is better), Submissions

Latest Results

Rows are imported from PinchBench's public API with official=true. Source model and provider names are preserved.

Rank Subject Best Score Model Match Provenance Sampled
1 anthropic/claude-opus-4.6 0.93 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
2 arcee-ai/trinity-large-thinking 0.92 A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-06
3 openai/gpt-5.4 0.90 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
4 qwen/qwen3.5-27b 0.90 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-06
5 minimax/minimax-m2.7 0.90 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-06
6 anthropic/claude-haiku-4.5 0.89 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
7 qwen/qwen3.5-397b-a17b 0.89 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-06
8 xiaomi/mimo-v2-flash 0.89 MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-06
9 qwen/qwen3.6-plus-preview 0.89 Imported 2026-05-06
10 nvidia/nemotron-3-super-120b-a12b 0.89 Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-06
11 anthropic/claude-sonnet-4.5 0.89 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
12 minimax/minimax-m2.1 0.88 MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-06
13 anthropic/claude-sonnet-4.6 0.88 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-06
14 minimax/minimax-m2.5 0.88 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-06
15 xiaomi/mimo-v2-pro 0.87 MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-06
16 anthropic/claude-opus-4.5 0.87 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
17 google/gemini-3-flash-preview 0.87 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06
18 google/gemini-3.1-pro-preview 0.87 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
19 z-ai/glm-5-turbo 0.87 GLM GLM 5 Turbo
z-ai-glm-5-turbo
Imported 2026-05-06
20 z-ai/glm-5 0.86 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
21 qwen/qwen3.5-plus-02-15 0.86 Qwen3.5 Plus 2026-02-15
qwen-qwen3.5-plus-02-15
Imported 2026-05-06
22 z-ai/glm-4.5-air 0.86 GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-06
23 xiaomi/mimo-v2-omni 0.86 MiMo-V2-Omni
xiaomi-mimo-v2-omni
Imported 2026-05-06
24 z-ai/glm-5v-turbo 0.86 GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Imported 2026-05-06
25 qwen/qwen3.5-122b-a10b 0.85 Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-06
26 stepfun/step-3.5-flash 0.85 S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-06
27 bytedance-seed/seed-2.0-lite 0.85 Seed-2.0-Lite
bytedance-seed-seed-2.0-lite
Imported 2026-05-06
28 moonshotai/kimi-k2.5 0.85 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
29 z-ai/glm-5.1 0.85 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-06
30 deepseek/deepseek-v3.2 0.84 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
31 google/gemma-4-26b-a4b-it 0.84 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-06
32 x-ai/grok-4.20 0.83 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-06
33 openrouter/hunter-alpha 0.83 Imported 2026-05-06
34 x-ai/grok-4.1-fast 0.82 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-06
35 mistralai/devstral-2512 0.82 Mistral: Devstral 2 2512
mistralai-devstral-2512
Imported 2026-05-06
36 openrouter/healer-alpha 0.81 Imported 2026-05-06
37 arcee-ai/trinity-large-preview 0.81 A Trinity Large Preview
arcee-ai-trinity-large-preview
Imported 2026-05-06
38 anthropic/claude-sonnet-4 0.80 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-06
39 qwen/qwen3-max-thinking 0.80 Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-06
40 openai/gpt-5-mini 0.80 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
41 qwen/qwen3-coder-next 0.79 Qwen3 Coder Next
qwen-qwen3-coder-next
Imported 2026-05-06
42 openai/gpt-5.4-nano 0.79 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-06
43 qwen/qwen3.5-35b-a3b 0.78 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-06
44 inception/mercury-2 0.78 I Mercury 2
inception-mercury-2
Imported 2026-05-06
45 arcee-ai/trinity-large-preview:free 0.78 Imported 2026-05-06
46 mistralai/mistral-small-2603 0.77 Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-06
47 google/gemma-4-31b-it 0.76 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-06
48 openai/gpt-5.4-mini 0.76 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-06
49 amazon/nova-2-lite-v1 0.75 Nova 2 Lite
amazon-nova-2-lite-v1
Imported 2026-05-06
50 openai/gpt-4o-mini 0.75 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
51 nvidia/nemotron-3-super-120b-a12b:free 0.75 Imported 2026-05-06
52 mistralai/mistral-large-2512 0.72 Mistral: Mistral Large 3 2512
mistralai-mistral-large-2512
Imported 2026-05-06
53 google/gemini-2.5-pro 0.72 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
54 deepseek/deepseek-chat 0.72 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
55 openai/gpt-4o 0.71 GPT-4o
openai-gpt-4o
Imported 2026-05-06
56 google/gemini-3-pro-preview 0.71 Gemini 3
google-gemini-3
Imported 2026-05-06
57 google/gemini-2.5-flash 0.71 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
58 openai/gpt-5-nano 0.69 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
59 openai/gpt-oss-120b 0.67 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
60 openai/gpt-oss-20b 0.66 gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-06
61 qwen/qwen3.6-plus 0.64 Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-06
62 meta-llama/llama-4-maverick 0.46 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
63 qwen/qwen3.5-9b 0.45 Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-06
64 qwen/qwen-2.5-7b-instruct 0.40 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Imported 2026-05-06
65 meta-llama/llama-3.1-70b-instruct 0.32 Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-06
66 google/gemini-2.5-flash-lite 0.22 Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-06
67 openai/gpt-5.4-pro 0.19 GPT-5.4 Pro
openai-gpt-5.4-pro
Imported 2026-05-06
68 meta-llama/llama-4-scout 0.08 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06