TempCompass

TempCompass is a comprehensive benchmark for evaluating temporal perception capabilities of Video Large Language Models (Video LLMs). It constructs conflicting videos that share identical static content but differ in specific temporal aspects to prevent models from exploiting single-frame bias. The benchmark evaluates multiple temporal aspects including action, motion, speed, temporal order, and attribute changes across diverse task formats including multi-choice QA, yes/no QA, caption matching, and caption generation.

2rows

scoreprimary metric

2026-05-06sampled

Metadata

ID: tempcompass
Category: Multimodal
Release: 2024-03-01
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Qwen2.5 VL 72B Instruct	0.75	Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct	Self-reported	2026-05-06
2	Qwen2.5 VL 7B Instruct	0.72	—	Self-reported	2026-05-06

Metadata

Metrics

Latest Results