SpreadsheetBench

Spreadsheet-agent benchmark for real Excel tasks and business spreadsheet workflows, including financial modeling, debugging, and visualization.

33rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Template, Financial Modeling, Debug, Visualization

Showing 4 latest source slices.

Latest Results

Provider-published Qwen3.7-Max comparison scores. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.6 Max 89.3% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
2 Qwen3.7 Max 87% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
3 GLM-5.1 Thinking 85.2% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
4 DeepSeek V4 Pro Max 84.9% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
5 Kimi K2.6 Thinking 84.5% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
6 Qwen3.6 Plus 80.2% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
1 Gemini in Google Sheets 70.48% Verified 2026-05-27
2 Qingqiu Agent 69.96% Verified 2026-05-27
3 Univer 68.86% Verified 2026-05-27
4 灵犀 66.89% Verified 2026-05-27
5 Bluebox 62.9% Verified 2026-05-27
6 Shortcut.ai 59.25% Verified 2026-05-27
7 Copilot in Excel (Agent Mode) 57.2% Imported 2026-05-27
8 ChatGPT Agent w/ .xlsx 45.5% Imported 2026-05-27
9 Claude Files Opus 4.1 42.9% Imported 2026-05-27
10 ChatGPT Agent 35.3% Imported 2026-05-27
11 OpenAI o3 23.3% Imported 2026-05-27
1 Qingqiu Agent 94.75% Verified 2026-05-27
2 Tetra-Beta-2 94.25% Verified 2026-05-27
3 GPT for Excel 92.5% Verified 2026-05-27
4 WPS AI (Seed 2.0) 91.25% Verified 2026-05-27
5 Nobie Agent 91% Verified 2026-05-27
6 Shortcut.ai 86% Verified 2026-05-27
7 Kyra 84.25% Verified 2026-05-27
8 Decide Agent 82.5% Verified 2026-05-27
1 Claude Opus 4.6 (Bash Agent) 34.89% Verified 2026-05-27
2 GPT-5.2 (Bash Agent) 26.79% Verified 2026-05-27
3 Gemini 3.1 Pro (Bash Agent) 23.68% Verified 2026-05-27
4 GLM-5.0 (Bash Agent) 17.14% Verified 2026-05-27
5 Deepseek-V3.2 (Bash Agent) 15.58% Verified 2026-05-27
6 Kimi K2.5 (Bash Agent) 14.64% Verified 2026-05-27
7 Qwen3.5-397B-A17B (Bash Agent) 11.22% Verified 2026-05-27
8 MiniMax M2.5 (Bash Agent) 7.17% Verified 2026-05-27