MVBench
MVBench: Evaluates temporal, video, speech, or audio understanding beyond static text and image inputs.
17rows
average_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Average accuracy
| Rank | Subject | Average accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | VideoChat2 (Mistral-7B) | 60.4 | — | Imported | 2026-05-27 |
| 2 | VideoChat2 (Vicuna-7B) | 51.1 | — | Imported | 2026-05-27 |
| 3 | GPT-4V | 43.5 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 4 | LLaVA | 36.0 | — | Imported | 2026-05-27 |
| 5 | VideoChat | 35.5 | — | Imported | 2026-05-27 |
| 6 | VideoChat2_text | 34.7 | — | Imported | 2026-05-27 |
| 7 | VideoLLaMA | 34.1 | — | Imported | 2026-05-27 |
| 8 | Otter-I | 33.5 | — | Imported | 2026-05-27 |
| 9 | VideoChatGPT | 32.7 | — | Imported | 2026-05-27 |
| 10 | InstructBLIP | 32.5 | — | Imported | 2026-05-27 |
| 11 | LLaMA-Adapter | 31.7 | — | Imported | 2026-05-27 |
| 12 | BLIP2 | 31.4 | — | Imported | 2026-05-27 |
| 13 | mPLUG-Owl-V | 29.7 | — | Imported | 2026-05-27 |
| 14 | mPLUG-Owl-I | 29.4 | — | Imported | 2026-05-27 |
| 15 | Random | 27.3 | — | Imported | 2026-05-27 |
| 16 | Otter-V | 26.8 | — | Imported | 2026-05-27 |
| 17 | MiniGPT-4 | 18.8 | — | Imported | 2026-05-27 |
No matching rows.