SEED-Bench-2

SEED-Bench-2: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.

32rows
avg_singleprimary metric
2026-05-06sampled

Metadata

Metrics

Avg. Single, Avg. Multi, Avg. Video, Avg. P1, Avg. P2, Avg. P3, Scene Understanding, Instance Identity, Instance Attribute, Instance Location, Instance Counting, Spatial Relation, Instance Interaction, Visual Reasoning, Text Recognition, Celebrity Recognition, Landmark Recognition, Chart Understanding, Visual Referring Expression, Science Knowledge, Emotion Recognition, Visual Mathematics, Difference Spotting, Meme Comprehension, Global Video Understanding, Action Recognition, Action Predicion, Procedure Understanding, In-Context Captioning, Interleaved Image-Text Analysis, Text-to-Image Generation, Next Image Prediction, Text-Image Creation

Latest Results

Rank Subject Avg. Single Model Match Provenance Sampled
1 MindAD 72.40 Imported 2026-05-06
2 SPHINXv2-1k 72.10 Imported 2026-05-06
3 Qwen-VL-plus 71.90 Qwen VL Plus
qwen-qwen-vl-plus
Imported 2026-05-06
4 GPT-4V 69.80 GPT-4
openai-gpt-4
Imported 2026-05-06
5 SPHINXv1-1k 68.50 Imported 2026-05-06
6 InternLM-XComposer-VL 66.50 Imported 2026-05-06
7 VisionLLaMA 66.10 Imported 2026-05-06
8 InternLM-XComposer-VL 65.40 Imported 2026-05-06
9 Gemini-Pro-Vision 62.90 Imported 2026-05-06
10 LLaVA-1.5 58.30 Imported 2026-05-06
11 InstructBLIP 52.40 Imported 2026-05-06
12 Kosmos-2 52.40 Imported 2026-05-06
13 Qwen-VL-Chat 50.30 Imported 2026-05-06
14 SEED-LLaMA 49.90 Imported 2026-05-06
15 InstructBLIP-Vicuna 47.50 Imported 2026-05-06
16 BLIP-2 46.80 Imported 2026-05-06
17 Emu 46.40 Imported 2026-05-06
18 MiniGPT-4 45 Imported 2026-05-06
19 LLaVA 42.40 Imported 2026-05-06
20 Claude-3-Opus 42 Imported 2026-05-06
21 IDEFICS-9b-instruct 38.80 Imported 2026-05-06
22 mPLUG-Owl 38.60 Imported 2026-05-06
23 Video-ChatGPT 38.30 Imported 2026-05-06
24 MultiModal-GPT 36.70 Imported 2026-05-06
25 VideoChat 36.70 Imported 2026-05-06
26 OpenFlamingo 36.60 Imported 2026-05-06
27 VPGTrans 36.60 Imported 2026-05-06
28 LLaMA-AdapterV2 36 Imported 2026-05-06
29 Otter 36 Imported 2026-05-06
30 Valley 35.30 Imported 2026-05-06
31 GVT 34.90 Imported 2026-05-06
32 Next-GPT 31 Imported 2026-05-06