VisIT-Bench Single Image

VisIT-Bench Single Image ranks vision-language models with human-preference Elo scores on instruction-following tasks over single images.

11rows
eloprimary metric
2026-05-06sampled

Metadata

Metrics

Elo, Matches, Win vs. Reference, Win vs. Reference Ratings

Latest Results

Rank Subject Elo Model Match Provenance Sampled
1 Human Verified GPT-4 Reference 1370 Imported 2026-05-06
2 LLaVA (13B) 1106 Imported 2026-05-06
3 LlamaAdapter-v2 (7B) 1082 Imported 2026-05-06
4 mPLUG-Owl (7B) 1081 Imported 2026-05-06
5 InstructBLIP (13B) 1011 Imported 2026-05-06
6 Otter (9B) 991 Imported 2026-05-06
7 VisualGPT (Da Vinci 003) 972 Imported 2026-05-06
8 MiniGPT-4 (7B) 921 Imported 2026-05-06
9 OpenFlamingo (9B) 877 Imported 2026-05-06
10 PandaGPT (13B) 826 Imported 2026-05-06
11 Multimodal GPT 763 Imported 2026-05-06