WebLINX (BrowserGym)

BrowserGym leaderboard slice for WebLINX, evaluating web agents under the BrowserGym result submission protocol.

6rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Std. Err. (lower is better)

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 GenericAgent-Claude-3.5-Sonnet 13.70 Imported 2026-05-06
2 GenericAgent-GPT-4o 12.50 Imported 2026-05-06
3 GenericAgent-GPT-o1-mini 12.50 Imported 2026-05-06
4 GenericAgent-GPT-4o-mini 11.60 Imported 2026-05-06
5 GenericAgent-Llama-3.1-70b 8.90 Imported 2026-05-06
6 GenericAgent-Llama-3.1-405b 7.90 Imported 2026-05-06