MVBench | BenchmarkList

Metadata

Average accuracy

Rank	Subject	Average accuracy	Model Match	Provenance	Sampled
1	VideoChat2 (Mistral-7B)	60.4	—	Imported	2026-05-27
2	VideoChat2 (Vicuna-7B)	51.1	—	Imported	2026-05-27
3	GPT-4V	43.5	GPT-4 openai-gpt-4	Imported	2026-05-27
4	LLaVA	36.0	—	Imported	2026-05-27
5	VideoChat	35.5	—	Imported	2026-05-27
6	VideoChat2_text	34.7	—	Imported	2026-05-27
7	VideoLLaMA	34.1	—	Imported	2026-05-27
8	Otter-I	33.5	—	Imported	2026-05-27
9	VideoChatGPT	32.7	—	Imported	2026-05-27
10	InstructBLIP	32.5	—	Imported	2026-05-27
11	LLaMA-Adapter	31.7	—	Imported	2026-05-27
12	BLIP2	31.4	—	Imported	2026-05-27
13	mPLUG-Owl-V	29.7	—	Imported	2026-05-27
14	mPLUG-Owl-I	29.4	—	Imported	2026-05-27
15	Random	27.3	—	Imported	2026-05-27
16	Otter-V	26.8	—	Imported	2026-05-27
17	MiniGPT-4	18.8	—	Imported	2026-05-27