TempCompass

TempCompass is a comprehensive benchmark for evaluating temporal perception capabilities of Video Large Language Models (Video LLMs). It constructs conflicting videos that share identical static content but differ in specific temporal aspects to prevent models from exploiting single-frame bias. The benchmark evaluates multiple temporal aspects including action, motion, speed, temporal order, and attribute changes across diverse task formats including multi-choice QA, yes/no QA, caption matching, and caption generation.

2rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Qwen2.5 VL 72B Instruct 0.75 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Self-reported 2026-05-06
2 Qwen2.5 VL 7B Instruct 0.72 Self-reported 2026-05-06