SheetCopilot Benchmark
Spreadsheet control benchmark for agents operating spreadsheet software through actions rather than only answering table questions.
5rows
exec_at_1primary metric
2026-05-27sampled
Metadata
Metrics
Exec@1, Pass@1, A50 (lower is better), A90 (lower is better)
| Rank | Subject | Exec@1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-3.5-Turbo (100% data) | 87.3% | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 2 | GPT-3.5-Turbo (10% data) | 85.0% | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 3 | Claude (10% data) | 80.0% | — | Imported | 2026-05-27 |
| 4 | VBA (100% data) | 77.8% | — | Imported | 2026-05-27 |
| 5 | GPT-4 (10% data) | 65.0% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
No matching rows.