OfficeBench
Office workflow agent benchmark spanning Word, Excel, PDF, email, calendar, and multi-application task completion.
8rows
overall_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Overall Score, Single-App Success, Two-App Success, Three-App Success
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemni-1.0 Pro (Feb 2024) | 12.33 | — | Imported | 2026-05-27 |
| 2 | Gemni-1.5 Flash (May 2024) | 18.67 | — | Imported | 2026-05-27 |
| 3 | Gemni-1.5 Pro (May 2024) | 26.00 | — | Imported | 2026-05-27 |
| 4 | GPT-3.5 Turbo (0125) | 5.35 | — | Imported | 2026-05-27 |
| 5 | GPT-4 Turbo (2024-04-09) | 38.00 | — | Imported | 2026-05-27 |
| 6 | GPT-4 Omni (2024-05-13) | 47.00 | — | Imported | 2026-05-27 |
| 7 | Llama 3 (70B-Instruct) | 27.33 | — | Imported | 2026-05-27 |
| 8 | Qwen 2 (72B-Instruct) | 21.16 | — | Imported | 2026-05-27 |
No matching rows.