WebLINX (BrowserGym)
BrowserGym leaderboard slice for WebLINX, evaluating web agents under the BrowserGym result submission protocol.
6rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Std. Err. (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GenericAgent-Claude-3.5-Sonnet | 13.70 | — | Imported | 2026-05-06 |
| 2 | GenericAgent-GPT-4o | 12.50 | — | Imported | 2026-05-06 |
| 3 | GenericAgent-GPT-o1-mini | 12.50 | — | Imported | 2026-05-06 |
| 4 | GenericAgent-GPT-4o-mini | 11.60 | — | Imported | 2026-05-06 |
| 5 | GenericAgent-Llama-3.1-70b | 8.90 | — | Imported | 2026-05-06 |
| 6 | GenericAgent-Llama-3.1-405b | 7.90 | — | Imported | 2026-05-06 |
No matching rows.