VideoDR

Video deep research benchmark for video-conditioned open-domain QA requiring cross-frame visual anchors, web retrieval, and multi-hop reasoning.

12rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Think Calls (lower is better), Search Calls (lower is better), Total Time (lower is better)

Latest Results

Rows parsed from the VideoDR public app bundle leaderboardData. The benchmark evaluates video-conditioned open-domain deep research under workflow and agentic paradigms.

Rank Subject Accuracy Model Match Provenance Sampled
1 Agentic + Gemini-3-pro-preview 76 Imported 2026-05-27
2 Workflow + Gemini-3-pro-preview 69 Imported 2026-05-27
3 Workflow + GPT-5.2 69 Imported 2026-05-27
4 Agentic + GPT-5.2 69 Imported 2026-05-27
5 Agentic + GPT-4o 43 Imported 2026-05-27
6 Workflow + GPT-4o 42 Imported 2026-05-27
7 Workflow + Qwen3-Omni-30B-A3B 37 Imported 2026-05-27
8 Agentic + Qwen3-Omni-30B-A3B 37 Imported 2026-05-27
9 Agentic + InternVL3.5-14B 30 Imported 2026-05-27
10 Workflow + InternVL3.5-14B 27 Imported 2026-05-27
11 Workflow + MiniCPM-V 4.5 25 Imported 2026-05-27
12 Agentic + MiniCPM-V 4.5 16 Imported 2026-05-27