VideoDR
Video deep research benchmark for video-conditioned open-domain QA requiring cross-frame visual anchors, web retrieval, and multi-hop reasoning.
12rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Think Calls (lower is better), Search Calls (lower is better), Total Time (lower is better)
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Agentic + Gemini-3-pro-preview | 76 | — | Imported | 2026-05-27 |
| 2 | Workflow + Gemini-3-pro-preview | 69 | — | Imported | 2026-05-27 |
| 3 | Workflow + GPT-5.2 | 69 | — | Imported | 2026-05-27 |
| 4 | Agentic + GPT-5.2 | 69 | — | Imported | 2026-05-27 |
| 5 | Agentic + GPT-4o | 43 | — | Imported | 2026-05-27 |
| 6 | Workflow + GPT-4o | 42 | — | Imported | 2026-05-27 |
| 7 | Workflow + Qwen3-Omni-30B-A3B | 37 | — | Imported | 2026-05-27 |
| 8 | Agentic + Qwen3-Omni-30B-A3B | 37 | — | Imported | 2026-05-27 |
| 9 | Agentic + InternVL3.5-14B | 30 | — | Imported | 2026-05-27 |
| 10 | Workflow + InternVL3.5-14B | 27 | — | Imported | 2026-05-27 |
| 11 | Workflow + MiniCPM-V 4.5 | 25 | — | Imported | 2026-05-27 |
| 12 | Agentic + MiniCPM-V 4.5 | 16 | — | Imported | 2026-05-27 |
No matching rows.