SEED-Bench

SEED-Bench: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.

63rows
avg_allprimary metric
2026-05-06sampled

Metadata

Metrics

Avg. All, Avg. Img, Avg. Video, Scene Understanding, Instance Identity, Instance Attribute, Instance Location, Instance Counting, Spatial Relation, Instance Interaction, Visual Reasoning, Text Recognition, Action Recognition, Action Prediction, Procedure Understanding

Latest Results

Rank Subject Avg. All Model Match Provenance Sampled
1 InternVL-Chat-V1.2-Plus 70.40 Imported 2026-05-06
2 Weitu-VL-1.0 69.20 Imported 2026-05-06
3 SPHINXv2-1k 67.50 Imported 2026-05-06
4 GPT-4V 67.30 GPT-4
openai-gpt-4
Imported 2026-05-06
5 Qwen-VL-plus 66.80 Qwen VL Plus
qwen-qwen-vl-plus
Imported 2026-05-06
6 SPHINXv1-1k 63.90 Imported 2026-05-06
7 [llava-v1.5-7b-finetune]() 62.80 Imported 2026-05-06
8 LLaVA-v1.5-LoRA 62.80 Imported 2026-05-06
9 LLaVA-v1.5-13B-LoRA 62.40 Imported 2026-05-06
10 InfMLLM-13B 62.30 Imported 2026-05-06
11 LLaVA-1.5 61.60 Imported 2026-05-06
12 [llava-v1.5-7b-dsn-ft]() 61.50 Imported 2026-05-06
13 [llava-v1.5-7b-910b]() 61 Imported 2026-05-06
14 Unified-IO-2 7B (2.5M) 60.50 Imported 2026-05-06
15 Unified-IO-2 7B 60.40 Imported 2026-05-06
16 Unified-IO-2 3B (3M) 60.20 Imported 2026-05-06
17 LLaMA-VID-7B 59.90 Imported 2026-05-06
18 Unified-IO-2 3B 58.70 Imported 2026-05-06
19 Qwen-VL-Chat 58.20 Imported 2026-05-06
20 mPLUG-Owl2 57.80 Imported 2026-05-06
21 [hh_resampler_v2-dsn]() 57.30 Imported 2026-05-06
22 [hh_v3_dsn]() 57 Imported 2026-05-06
23 [hh_resampler_llava]() 56.60 Imported 2026-05-06
24 Qwen-VL 56.30 Imported 2026-05-06
25 InstructBLIP-Vicuna 53.40 Imported 2026-05-06
26 InstructBLIP 52.70 Imported 2026-05-06
27 Kosmos-2 50 Imported 2026-05-06
28 Unified-IO-2 1B 49.60 Imported 2026-05-06
29 SEED-LLaMA 48.90 Imported 2026-05-06
30 BLIP-2 46.40 Imported 2026-05-06
31 MiniGPT-4 42.80 Imported 2026-05-06
32 Claude-3-Opus 40.90 Imported 2026-05-06
33 OpenFlamingo 40.90 Imported 2026-05-06
34 Otter 39.70 Imported 2026-05-06
35 VPGTrans 39.10 Imported 2026-05-06
36 VideoChat 37.60 Imported 2026-05-06
37 mPLUG-Owl 34 Imported 2026-05-06
38 Otter 33.90 Imported 2026-05-06
39 GVT 33.50 Imported 2026-05-06
40 MultiModal-GPT 33.20 Imported 2026-05-06
41 OpenFlamingo 33.10 Imported 2026-05-06
42 LLaMA-AdapterV2 32.70 Imported 2026-05-06
43 Video-ChatGPT 31.20 Imported 2026-05-06
44 Valley 30.30 Imported 2026-05-06
45 Vicuna 28.50 Imported 2026-05-06
46 Flan-T5 27.70 Imported 2026-05-06
47 LLaMA 26.80 Imported 2026-05-06
48 ALIP_llava 0 Imported 2026-05-06
49 DreamLIP 0 Imported 2026-05-06
50 DreamLIP_30m 0 Imported 2026-05-06
51 Gemini-Pro-Vision 0 Imported 2026-05-06
52 Honeybee-13B 0 Imported 2026-05-06
53 IDEFICS-80b-instruct 0 Imported 2026-05-06
54 IDEFICS-9b-instruct 0 Imported 2026-05-06
55 InternLM-XComposer-VL 0 Imported 2026-05-06
56 InternLM-XComposer2-VL-7B 0 Imported 2026-05-06
57 LaCLIP_llava 0 Imported 2026-05-06
58 LLaVA-7B + detection and grounding trained 0 Imported 2026-05-06
59 MiniCPM-Llama3-V2.5 0 Imported 2026-05-06
60 MiniCPM-V-2 0 Imported 2026-05-06
61 Pink-LLaMA2 0 Imported 2026-05-06
62 ShareGPT4V-13B 0 Imported 2026-05-06
63 ShareGPT4V-7B 0 Imported 2026-05-06