LongDocURL | BenchmarkList

Metadata

Understanding, Reasoning, Locating, Text, Layout, Figure, Table, Single Page, Multi Page, Cross Element, Total

Rank	Subject	Total	Model Match	Provenance	Sampled
1	GPT-4o-2024-05-13 --> GPT-4o-24-05-13	64.5%	GPT-4o openai-gpt-4o	Imported	2026-05-27
2	GPT-4o-2024-05-13 --> Gemini-1.5-Pro	50.9%	GPT-4o openai-gpt-4o	Imported	2026-05-27
3	GPT-4o-2024-05-13 --> Qwen-VL-Max	49.5%	GPT-4o openai-gpt-4o	Imported	2026-05-27
4	GPT-4o-2024-05-13 --> Qwen2-VL	30.6%	GPT-4o openai-gpt-4o	Imported	2026-05-27
5	GPT-4o-2024-05-13 --> LLaVA-OneVision-Chat	25%	GPT-4o openai-gpt-4o	Imported	2026-05-27
6	GPT-4o-2024-05-13 --> LLaVA-Next-Interleave-DPO	16.2%	GPT-4o openai-gpt-4o	Imported	2026-05-27
7	GPT-4o-2024-05-13 --> Llama-3.2	9.2%	GPT-4o openai-gpt-4o	Imported	2026-05-27