WebBench

Web task benchmark from Halluminate with public browser-agent result CSVs covering navigation and information-gathering tasks.

8rows
success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Success Rate, Successful Tasks, Evaluated Tasks

Latest Results

Rows aggregate public WebBench result CSV files from the Halluminate/WebBench GitHub repository. The primary score is task success rate per browser-agent/system result file.

Rank Subject Success Rate Model Match Provenance Sampled
1 rtrvr 79.8762 Imported 2026-05-27
2 Operator HITL 76.4706 Imported 2026-05-27
3 Anthropic Computer Use 65.9869 Imported 2026-05-27
4 Skyvern 2.0 64.3556 Imported 2026-05-27
5 Skyvern 2.0 Browserbase 60.6852 Imported 2026-05-27
6 OpenAI CUA 59.8287 Imported 2026-05-27
7 browser-use 43.921 Imported 2026-05-27
8 Convergence HITL 39.9381 Imported 2026-05-27