WorkArena-L1
BrowserGym leaderboard slice for WorkArena-L1, evaluating web agents on atomic ServiceNow knowledge-work tasks.
17rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Std. Err. (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | IpaziaHPA-Gemini-3-flash-preview | 90.30 | — | Imported | 2026-05-06 |
| 2 | GenericAgent-GPT-5 | 79.10 | — | Imported | 2026-05-06 |
| 3 | GenericAgent-Claude-4-Sonnet | 63.30 | — | Imported | 2026-05-06 |
| 4 | GenericAgent-GPT-5-mini | 60.60 | — | Imported | 2026-05-06 |
| 5 | GenericAgent-GPT-o1-mini | 56.70 | — | Imported | 2026-05-06 |
| 6 | GenericAgent-Claude-3.5-Sonnet | 56.40 | — | Imported | 2026-05-06 |
| 7 | GenericAgent-GPT-o1-mini | 51.80 | — | Imported | 2026-05-06 |
| 8 | A3-Qwen3.5-9B | 51.50 | — | Imported | 2026-05-06 |
| 9 | GenericAgent-GPT-oss-120b | 50.90 | — | Imported | 2026-05-06 |
| 10 | GenericAgent-o3-mini | 48.20 | — | Imported | 2026-05-06 |
| 11 | GenericAgent-GPT-4o | 45.50 | — | Imported | 2026-05-06 |
| 12 | GenericAgent-Llama-3.1-405b | 43.30 | — | Imported | 2026-05-06 |
| 13 | GenericAgent-GPT-5-nano | 40.60 | — | Imported | 2026-05-06 |
| 14 | GenericAgent-GPT-oss-20b | 38.50 | — | Imported | 2026-05-06 |
| 15 | GenericAgent-AgentTrek-1.0-32b | 38.29 | — | Imported | 2026-05-06 |
| 16 | GenericAgent-Llama-3.1-70b | 27.90 | — | Imported | 2026-05-06 |
| 17 | GenericAgent-GPT-4o-mini | 27 | — | Imported | 2026-05-06 |
No matching rows.