PinchBench
Real-world OpenClaw agent benchmark evaluating how LLMs perform as the model inside an agent across practical coding, scheduling, research, email, and file-management workflows.
68rows
best_score_percentageprimary metric
2026-05-06sampled
Metadata
Metrics
Best Score, Average Score, Average Execution Time (lower is better), Best Execution Time (lower is better), Average Cost (lower is better), Best Cost (lower is better), Submissions
| Rank | Subject | Best Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | anthropic/claude-opus-4.6 | 0.93 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 2 | arcee-ai/trinity-large-thinking | 0.92 | Trinity Large Thinking arcee-ai-trinity-large-thinking | Imported | 2026-05-06 |
| 3 | openai/gpt-5.4 | 0.90 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 4 | qwen/qwen3.5-27b | 0.90 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-06 |
| 5 | minimax/minimax-m2.7 | 0.90 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 6 | anthropic/claude-haiku-4.5 | 0.89 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 7 | qwen/qwen3.5-397b-a17b | 0.89 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-06 |
| 8 | xiaomi/mimo-v2-flash | 0.89 | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-06 |
| 9 | qwen/qwen3.6-plus-preview | 0.89 | — | Imported | 2026-05-06 |
| 10 | nvidia/nemotron-3-super-120b-a12b | 0.89 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-06 |
| 11 | anthropic/claude-sonnet-4.5 | 0.89 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 12 | minimax/minimax-m2.1 | 0.88 | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-06 |
| 13 | anthropic/claude-sonnet-4.6 | 0.88 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 14 | minimax/minimax-m2.5 | 0.88 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-06 |
| 15 | xiaomi/mimo-v2-pro | 0.87 | MiMo-V2-Pro xiaomi-mimo-v2-pro | Imported | 2026-05-06 |
| 16 | anthropic/claude-opus-4.5 | 0.87 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 17 | google/gemini-3-flash-preview | 0.87 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 18 | google/gemini-3.1-pro-preview | 0.87 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 19 | z-ai/glm-5-turbo | 0.87 | GLM 5 Turbo z-ai-glm-5-turbo | Imported | 2026-05-06 |
| 20 | z-ai/glm-5 | 0.86 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 21 | qwen/qwen3.5-plus-02-15 | 0.86 | Qwen3.5 Plus 2026-02-15 qwen-qwen3.5-plus-02-15 | Imported | 2026-05-06 |
| 22 | z-ai/glm-4.5-air | 0.86 | GLM 4.5 Air z-ai-glm-4.5-air | Imported | 2026-05-06 |
| 23 | xiaomi/mimo-v2-omni | 0.86 | MiMo-V2-Omni xiaomi-mimo-v2-omni | Imported | 2026-05-06 |
| 24 | z-ai/glm-5v-turbo | 0.86 | GLM 5V Turbo z-ai-glm-5v-turbo | Imported | 2026-05-06 |
| 25 | qwen/qwen3.5-122b-a10b | 0.85 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Imported | 2026-05-06 |
| 26 | stepfun/step-3.5-flash | 0.85 | Step 3.5 Flash stepfun-step-3.5-flash | Imported | 2026-05-06 |
| 27 | bytedance-seed/seed-2.0-lite | 0.85 | Seed-2.0-Lite bytedance-seed-seed-2.0-lite | Imported | 2026-05-06 |
| 28 | moonshotai/kimi-k2.5 | 0.85 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 29 | z-ai/glm-5.1 | 0.85 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 30 | deepseek/deepseek-v3.2 | 0.84 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 31 | google/gemma-4-26b-a4b-it | 0.84 | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-06 |
| 32 | x-ai/grok-4.20 | 0.83 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-06 |
| 33 | openrouter/hunter-alpha | 0.83 | — | Imported | 2026-05-06 |
| 34 | x-ai/grok-4.1-fast | 0.82 | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-06 |
| 35 | mistralai/devstral-2512 | 0.82 | Mistral: Devstral 2 2512 mistralai-devstral-2512 | Imported | 2026-05-06 |
| 36 | openrouter/healer-alpha | 0.81 | — | Imported | 2026-05-06 |
| 37 | arcee-ai/trinity-large-preview | 0.81 | Trinity Large Preview arcee-ai-trinity-large-preview | Imported | 2026-05-06 |
| 38 | anthropic/claude-sonnet-4 | 0.80 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-06 |
| 39 | qwen/qwen3-max-thinking | 0.80 | Qwen3 Max Thinking qwen-qwen3-max-thinking | Imported | 2026-05-06 |
| 40 | openai/gpt-5-mini | 0.80 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 41 | qwen/qwen3-coder-next | 0.79 | Qwen3 Coder Next qwen-qwen3-coder-next | Imported | 2026-05-06 |
| 42 | openai/gpt-5.4-nano | 0.79 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 43 | qwen/qwen3.5-35b-a3b | 0.78 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-06 |
| 44 | inception/mercury-2 | 0.78 | Mercury 2 inception-mercury-2 | Imported | 2026-05-06 |
| 45 | arcee-ai/trinity-large-preview:free | 0.78 | — | Imported | 2026-05-06 |
| 46 | mistralai/mistral-small-2603 | 0.77 | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-06 |
| 47 | google/gemma-4-31b-it | 0.76 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-06 |
| 48 | openai/gpt-5.4-mini | 0.76 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 49 | amazon/nova-2-lite-v1 | 0.75 | Nova 2 Lite amazon-nova-2-lite-v1 | Imported | 2026-05-06 |
| 50 | openai/gpt-4o-mini | 0.75 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 51 | nvidia/nemotron-3-super-120b-a12b:free | 0.75 | — | Imported | 2026-05-06 |
| 52 | mistralai/mistral-large-2512 | 0.72 | Mistral: Mistral Large 3 2512 mistralai-mistral-large-2512 | Imported | 2026-05-06 |
| 53 | google/gemini-2.5-pro | 0.72 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 54 | deepseek/deepseek-chat | 0.72 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 55 | openai/gpt-4o | 0.71 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 56 | google/gemini-3-pro-preview | 0.71 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 57 | google/gemini-2.5-flash | 0.71 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 58 | openai/gpt-5-nano | 0.69 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-06 |
| 59 | openai/gpt-oss-120b | 0.67 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 60 | openai/gpt-oss-20b | 0.66 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
| 61 | qwen/qwen3.6-plus | 0.64 | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-06 |
| 62 | meta-llama/llama-4-maverick | 0.46 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 63 | qwen/qwen3.5-9b | 0.45 | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-06 |
| 64 | qwen/qwen-2.5-7b-instruct | 0.40 | Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct | Imported | 2026-05-06 |
| 65 | meta-llama/llama-3.1-70b-instruct | 0.32 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Imported | 2026-05-06 |
| 66 | google/gemini-2.5-flash-lite | 0.22 | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-06 |
| 67 | openai/gpt-5.4-pro | 0.19 | GPT-5.4 Pro openai-gpt-5.4-pro | Imported | 2026-05-06 |
| 68 | meta-llama/llama-4-scout | 0.08 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
No matching rows.