VisualWebBench
A multimodal benchmark designed to assess the capabilities of multimodal large language models (MLLMs) across web page understanding and grounding tasks. Comprises 7 tasks (captioning, webpage QA, heading OCR, element OCR, element grounding, action prediction, and action grounding) with 1.5K human-curated instances from 139 real websites across 87 sub-domains.
2rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Nova Pro | 0.80 | Nova Pro 1.0 amazon-nova-pro-v1 | Self-reported | 2026-05-06 |
| 2 | Nova Lite | 0.78 | Nova Lite 1.0 amazon-nova-lite-v1 | Self-reported | 2026-05-06 |
No matching rows.