VisIT-Bench Single Image
VisIT-Bench Single Image ranks vision-language models with human-preference Elo scores on instruction-following tasks over single images.
11rows
eloprimary metric
2026-05-06sampled
Metadata
Metrics
Elo, Matches, Win vs. Reference, Win vs. Reference Ratings
| Rank | Subject | Elo | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Verified GPT-4 Reference | 1370 | — | Imported | 2026-05-06 |
| 2 | LLaVA (13B) | 1106 | — | Imported | 2026-05-06 |
| 3 | LlamaAdapter-v2 (7B) | 1082 | — | Imported | 2026-05-06 |
| 4 | mPLUG-Owl (7B) | 1081 | — | Imported | 2026-05-06 |
| 5 | InstructBLIP (13B) | 1011 | — | Imported | 2026-05-06 |
| 6 | Otter (9B) | 991 | — | Imported | 2026-05-06 |
| 7 | VisualGPT (Da Vinci 003) | 972 | — | Imported | 2026-05-06 |
| 8 | MiniGPT-4 (7B) | 921 | — | Imported | 2026-05-06 |
| 9 | OpenFlamingo (9B) | 877 | — | Imported | 2026-05-06 |
| 10 | PandaGPT (13B) | 826 | — | Imported | 2026-05-06 |
| 11 | Multimodal GPT | 763 | — | Imported | 2026-05-06 |
No matching rows.