MMLongBench-Doc
Long-context multimodal document understanding benchmark evaluating vision-language and omni models on document comprehension accuracy.
19rows
accuracyprimary metric
2026-05-06sampled
Metadata
Metrics
Accuracy
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude 4.5 Opus | 61.90 | — | Imported | 2026-05-06 |
| 2 | Qwen3.5-397B-A17B | 61.50 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-06 |
| 3 | Gemini-3 Pro | 60.50 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 4 | OriOn-Qwen-SR1 | 58.30 | — | Imported | 2026-05-06 |
| 5 | NVIDIA Nemotron 3 Nano Omni 30B A3B | 57.60 | Nemotron 3 Nano Omni nvidia-nemotron-3-nano-omni-30b-a3b-reasoning | Imported | 2026-05-06 |
| 6 | Qwen3-VL-235B-A22B-Instruct | 57 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Imported | 2026-05-06 |
| 7 | Qwen3-VL-235B-A22B-Thinking | 56.20 | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Imported | 2026-05-06 |
| 8 | TeleMM-2.0 | 56.10 | — | Imported | 2026-05-06 |
| 9 | GLM-4.6V | 54.90 | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-06 |
| 10 | GPT-4.1 2025-04-14 detail high | 49.70 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 11 | GPT-4o 2024-11-20 detail high | 46.30 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 12 | GLM-4.5V | 44.70 | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-06 |
| 13 | GLM-4.1V-Thinking | 42.40 | — | Imported | 2026-05-06 |
| 14 | Kimi-VL-Thinking-2506 | 42.10 | — | Imported | 2026-05-06 |
| 15 | Qwen2.5-VL-72B | 35.20 | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-06 |
| 16 | Kimi-VL-A3B-Instruct | 35.10 | — | Imported | 2026-05-06 |
| 17 | MiniMax-VL-01 | 32.50 | — | Imported | 2026-05-06 |
| 18 | Aria | 28.30 | — | Imported | 2026-05-06 |
| 19 | Qwen2.5-VL-7B | 25.10 | — | Imported | 2026-05-06 |
No matching rows.