AutoEval-Video

AutoEval-Video evaluates large vision-language models on open-ended video question answering across nine video perception and reasoning dimensions.

8rows
overall_accprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Acc., Dynamic Perception, State Transitions Perception, Comparison Reasoning, Reasoning with External Knowledge, Explanatory Reasoning, Predictive Reasoning, Description, Counterfactual Reasoning, Camera Movement Perception

Latest Results

Rank Subject Overall Acc. Model Match Provenance Sampled
1 GPT-4V 22.20 GPT-4
openai-gpt-4
Imported 2026-05-06
2 VideoChat 13.40 Imported 2026-05-06
3 Video-LLaMA 11.20 Imported 2026-05-06
4 LLaVA-1.5 8.50 Imported 2026-05-06
5 InstructBLIP 7.90 Imported 2026-05-06
6 Qwen-VL 7 Imported 2026-05-06
7 Video-ChatGPT 6.70 Imported 2026-05-06
8 BLIP-2 0.30 Imported 2026-05-06