VisualWebArena
VisualWebArena: Measures browser, desktop, mobile, or GUI agents operating in interactive environments.
24rows
success_rateprimary metric
2026-05-05sampled
Metadata
Metrics
Success Rate
| Rank | Subject | Success Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Performance + - | 88.70 | — | Imported | 2026-05-05 |
| 2 | GPT-4o | 19.78 | — | Imported | 2026-05-05 |
| 3 | GPT-4V | 16.37 | — | Imported | 2026-05-05 |
| 4 | GPT-4V | 15.05 | — | Imported | 2026-05-05 |
| 5 | GPT-4 | 12.75 | — | Imported | 2026-05-05 |
| 6 | Gemini-Pro-1.5 | 11.98 | — | Imported | 2026-05-05 |
| 7 | LLaMA-3-70B-Instruct | 9.78 | — | Imported | 2026-05-05 |
| 8 | GPT-4 | 7.25 | — | Imported | 2026-05-05 |
| 9 | Gemini-Flash-1.5 | 6.59 | — | Imported | 2026-05-05 |
| 10 | Gemini-Pro | 6.04 | — | Imported | 2026-05-05 |
| 11 | Gemini-Pro | 5.71 | — | Imported | 2026-05-05 |
| 12 | Gemini-Pro | 3.85 | — | Imported | 2026-05-05 |
| 13 | GPT-3.5 | 2.97 | — | Imported | 2026-05-05 |
| 14 | GPT-3.5 | 2.75 | — | Imported | 2026-05-05 |
| 15 | Gemini-Pro | 2.20 | — | Imported | 2026-05-05 |
| 16 | GPT-3.5 | 2.20 | — | Imported | 2026-05-05 |
| 17 | Mixtral-8x7B | 1.87 | — | Imported | 2026-05-05 |
| 18 | Mixtral-8x7B | 1.76 | — | Imported | 2026-05-05 |
| 19 | Text-only + LLaMA-2-70B + - | 1.10 | — | Imported | 2026-05-05 |
| 20 | Multimodal (SoM) + IDEFICS-80B-Instruct | 0.99 | — | Imported | 2026-05-05 |
| 21 | Multimodal + IDEFICS-80B-Instruct | 0.77 | — | Imported | 2026-05-05 |
| 22 | Caption-augmented + LLaMA-2-70B + BLIP-2-T5XL | 0.66 | — | Imported | 2026-05-05 |
| 23 | CogVLM | 0.33 | — | Imported | 2026-05-05 |
| 24 | CogVLM | 0.33 | — | Imported | 2026-05-05 |
No matching rows.