LongDocURL
Long-document understanding benchmark for locating and reasoning over evidence across very large document collections.
7rows
totalprimary metric
2026-05-27sampled
Metadata
Metrics
Understanding, Reasoning, Locating, Text, Layout, Figure, Table, Single Page, Multi Page, Cross Element, Total
| Rank | Subject | Total | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o-2024-05-13 --> GPT-4o-24-05-13 | 64.5% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 2 | GPT-4o-2024-05-13 --> Gemini-1.5-Pro | 50.9% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 3 | GPT-4o-2024-05-13 --> Qwen-VL-Max | 49.5% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 4 | GPT-4o-2024-05-13 --> Qwen2-VL | 30.6% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 5 | GPT-4o-2024-05-13 --> LLaVA-OneVision-Chat | 25% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 6 | GPT-4o-2024-05-13 --> LLaVA-Next-Interleave-DPO | 16.2% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 7 | GPT-4o-2024-05-13 --> Llama-3.2 | 9.2% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
No matching rows.