The Agent Company

Multi-step workplace automation benchmark for autonomous agents.

8rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 Claude 3.7 Sonnet 52.73 Imported 2026-05-06
2 Claude Opus 4.5 46.45 Imported 2026-05-06
3 Gemini 2.5 Pro (Jun 2025) 39.85 Imported 2026-05-06
4 DeepSeek V3 29.91 Imported 2026-05-06
5 Qwen2.5-Max 23.99 Imported 2026-05-06
6 Llama 3.1 405B 22.90 Imported 2026-05-06
7 Gemini 1.5 Flash 22.10 Imported 2026-05-06
8 GPT-4o 14.55 Imported 2026-05-06