SLVMEval

Synthetic meta-evaluation benchmark for text-to-long-video evaluation systems, covering long videos and ten quality/consistency aspects.

9rows
macro_accuracyprimary metric
2026-05-28sampled

Metadata

Metrics

Macro Accuracy, Aesthetics Accuracy, Technical Quality Accuracy, Appearance/Style Accuracy, Background Consistency Accuracy, Object Integrity Accuracy, Color Accuracy, Dynamics Degree Accuracy, Comprehensiveness Accuracy, Spatial Relationship Accuracy, Temporal Flow Accuracy

Latest Results

Rows are imported from public arXiv source LaTeX. The benchmark evaluates long-video evaluation systems across ten synthetic meta-evaluation aspects.

Rank Subject Macro Accuracy Model Match Provenance Sampled
1 Human 91.73% Imported 2026-05-28
2 Video-based GPT-5 71.66% Imported 2026-05-28
3 Text-based GPT-5-mini 61.35% Imported 2026-05-28
4 Video-based GPT-5-mini 61.32% Imported 2026-05-28
5 CLIPScore 60.74% Imported 2026-05-28
6 Text-based GPT-5 60.70% Imported 2026-05-28
7 Text-based Qwen3 56.92% Imported 2026-05-28
8 Video-based Qwen3 50.44% Imported 2026-05-28
9 VideoScore 50.20% Imported 2026-05-28