MSRVTT-QA

MSRVTT-QA: Evaluates temporal, video, speech, or audio understanding beyond static text and image inputs.

30rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy

Latest Results

Rows are imported from the public Papers With Code 2 MSRVTT-QA leaderboard static HTML payload. Primary score is accuracy.

Rank Subject Accuracy Model Match Provenance Sampled
1 Flash-VStream 72.4% Imported 2026-05-27
2 PLLaVA (34B) 68.7% Imported 2026-05-27
3 Elysium 67.5% Imported 2026-05-27
4 SlowFast-LLaVA-34B 67.4% Imported 2026-05-27
5 Tarsier (34B) 66.4% Imported 2026-05-27
6 LinVT-Qwen2-VL\n(7B) 66.2% Imported 2026-05-27
7 TS-LLaVA-34B 66.2% Imported 2026-05-27
8 PPLLaVA-7B 64.3% Imported 2026-05-27
9 IG-VLM 63.8% Imported 2026-05-27
10 ST-LLM 63.2% Imported 2026-05-27
11 CAT-7B 62.1% Imported 2026-05-27
12 VideoGPT+ 60.6% Imported 2026-05-27
13 Vista-LLaMA-7B 60.5% Imported 2026-05-27
14 MiniGPT4-video-7B 59.73% Imported 2026-05-27
15 LLaVA-Mini 59.5% Imported 2026-05-27
16 Video-LaVIT 59.3% Imported 2026-05-27
17 Video-LLaVA-7B 59.2% Imported 2026-05-27
18 LLaMA-VID-13B (2 Token) 58.9% Imported 2026-05-27
19 LLaMA-VID-7B (2 Token) 57.7% Imported 2026-05-27
20 SUM-shot+Vicuna 56.8% Imported 2026-05-27
21 Omni-VideoAssistant 55.3% Imported 2026-05-27
22 Chat-UniVi-7B 55% Imported 2026-05-27
23 VideoChat2 54.1% Imported 2026-05-27
24 MovieChat 52.7% Imported 2026-05-27
25 BT-Adapter (zero-shot) 51.2% Imported 2026-05-27
26 BT-Adapter (zero-shot) 51.2% Imported 2026-05-27
27 Video-ChatGPT-7B 49.3% Imported 2026-05-27
28 Video Chat-7B 45% Imported 2026-05-27
29 LLaMA Adapter-7B 43.8% Imported 2026-05-27
30 Video LLaMA-7B 29.6% Imported 2026-05-27