VisIT-Bench Multiple Images

VisIT-Bench Multiple Images ranks vision-language models with human-preference Elo scores on instruction-following tasks over multiple images.

4rows
eloprimary metric
2026-05-06sampled

Metadata

Metrics

Elo, Matches, Win vs. Reference, Win vs. Reference Ratings

Latest Results

Rank Subject Elo Model Match Provenance Sampled
1 Human Verified GPT-4 Reference 1192 Imported 2026-05-06
2 mPLUG-Owl 995 Imported 2026-05-06
3 Otter 911 Imported 2026-05-06
4 OpenFlamingo 902 Imported 2026-05-06