MVBench

MVBench: Evaluates temporal, video, speech, or audio understanding beyond static text and image inputs.

17rows
average_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Average accuracy

Latest Results

Rows are parsed from the MVBench paper arXiv LaTeX full-results table.

Rank Subject Average accuracy Model Match Provenance Sampled
1 VideoChat2 (Mistral-7B) 60.4 Imported 2026-05-27
2 VideoChat2 (Vicuna-7B) 51.1 Imported 2026-05-27
3 GPT-4V 43.5 GPT-4
openai-gpt-4
Imported 2026-05-27
4 LLaVA 36.0 Imported 2026-05-27
5 VideoChat 35.5 Imported 2026-05-27
6 VideoChat2_text 34.7 Imported 2026-05-27
7 VideoLLaMA 34.1 Imported 2026-05-27
8 Otter-I 33.5 Imported 2026-05-27
9 VideoChatGPT 32.7 Imported 2026-05-27
10 InstructBLIP 32.5 Imported 2026-05-27
11 LLaMA-Adapter 31.7 Imported 2026-05-27
12 BLIP2 31.4 Imported 2026-05-27
13 mPLUG-Owl-V 29.7 Imported 2026-05-27
14 mPLUG-Owl-I 29.4 Imported 2026-05-27
15 Random 27.3 Imported 2026-05-27
16 Otter-V 26.8 Imported 2026-05-27
17 MiniGPT-4 18.8 Imported 2026-05-27