Online Mind2Web (HAL)
HAL's standardized, cost-aware agent leaderboard for Online Mind2Web web navigation tasks.
22rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Cost (USD) (lower is better), Runs
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | SeeAct / GPT-5 Medium (August 2025) | 42.33 | — | Verified | 2026-05-27 |
| 2 | Browser-Use / Claude Sonnet 4 (May 2025) | 40 | — | Verified | 2026-05-27 |
| 3 | Browser-Use / Claude-3.7 Sonnet High (February 2025) | 39.33 | — | Verified | 2026-05-27 |
| 4 | Browser-Use / Claude Sonnet 4 High (May 2025) | 39.33 | — | Verified | 2026-05-27 |
| 5 | SeeAct / o3 Medium (April 2025) | 39 | — | Verified | 2026-05-27 |
| 6 | Browser-Use / Claude-3.7 Sonnet (February 2025) | 38.33 | — | Verified | 2026-05-27 |
| 7 | SeeAct / Claude Sonnet 4 (May 2025) | 36.67 | — | Verified | 2026-05-27 |
| 8 | SeeAct / Claude Sonnet 4 High (May 2025) | 36.67 | — | Verified | 2026-05-27 |
| 9 | Browser-Use / GPT-4.1 (April 2025) | 36.33 | — | Verified | 2026-05-27 |
| 10 | Browser-Use / DeepSeek V3 (March 2025) | 32.33 | — | Verified | 2026-05-27 |
| 11 | SeeAct / o4-mini High (April 2025) | 32 | — | Verified | 2026-05-27 |
| 12 | Browser-Use / GPT-5 Medium (August 2025) | 32 | — | Verified | 2026-05-27 |
| 13 | SeeAct / o4-mini Low (April 2025) | 31.67 | — | Verified | 2026-05-27 |
| 14 | SeeAct / GPT-4.1 (April 2025) | 30.33 | — | Verified | 2026-05-27 |
| 15 | SeeAct / Claude-3.7 Sonnet High (February 2025) | 30.33 | — | Verified | 2026-05-27 |
| 16 | Browser-Use / Gemini 2.0 Flash (February 2025) | 29 | — | Verified | 2026-05-27 |
| 17 | Browser-Use / o3 Medium (April 2025) | 29 | — | Verified | 2026-05-27 |
| 18 | SeeAct / Claude-3.7 Sonnet (February 2025) | 28.33 | — | Verified | 2026-05-27 |
| 19 | SeeAct / Gemini 2.0 Flash (February 2025) | 26.67 | — | Verified | 2026-05-27 |
| 20 | Browser-Use / DeepSeek R1 (January 2025) | 25.33 | — | Verified | 2026-05-27 |
| 21 | Browser-Use / o4-mini High (April 2025) | 20 | — | Verified | 2026-05-27 |
| 22 | Browser-Use / o4-mini Low (April 2025) | 18.33 | — | Verified | 2026-05-27 |
No matching rows.