VisIT-Bench Multiple Images
VisIT-Bench Multiple Images ranks vision-language models with human-preference Elo scores on instruction-following tasks over multiple images.
4rows
eloprimary metric
2026-05-06sampled
Metadata
Metrics
Elo, Matches, Win vs. Reference, Win vs. Reference Ratings
| Rank | Subject | Elo | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Verified GPT-4 Reference | 1192 | — | Imported | 2026-05-06 |
| 2 | mPLUG-Owl | 995 | — | Imported | 2026-05-06 |
| 3 | Otter | 911 | — | Imported | 2026-05-06 |
| 4 | OpenFlamingo | 902 | — | Imported | 2026-05-06 |
No matching rows.