AutoEval-Video
AutoEval-Video evaluates large vision-language models on open-ended video question answering across nine video perception and reasoning dimensions.
8rows
overall_accprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Acc., Dynamic Perception, State Transitions Perception, Comparison Reasoning, Reasoning with External Knowledge, Explanatory Reasoning, Predictive Reasoning, Description, Counterfactual Reasoning, Camera Movement Perception
| Rank | Subject | Overall Acc. | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4V | 22.20 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 2 | VideoChat | 13.40 | — | Imported | 2026-05-06 |
| 3 | Video-LLaMA | 11.20 | — | Imported | 2026-05-06 |
| 4 | LLaVA-1.5 | 8.50 | — | Imported | 2026-05-06 |
| 5 | InstructBLIP | 7.90 | — | Imported | 2026-05-06 |
| 6 | Qwen-VL | 7 | — | Imported | 2026-05-06 |
| 7 | Video-ChatGPT | 6.70 | — | Imported | 2026-05-06 |
| 8 | BLIP-2 | 0.30 | — | Imported | 2026-05-06 |
No matching rows.