MMLongBench-Doc

Long-context multimodal document understanding benchmark evaluating vision-language and omni models on document comprehension accuracy.

19rows
accuracyprimary metric
2026-05-06sampled

Metadata

Metrics

Accuracy

Latest Results

Rows are parsed from the public Space leaderboard_data.json. Source display names, links, modality, parameter, open-source, and release metadata are preserved.

Rank Subject Accuracy Model Match Provenance Sampled
1 Claude 4.5 Opus 61.90 Imported 2026-05-06
2 Qwen3.5-397B-A17B 61.50 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-06
3 Gemini-3 Pro 60.50 Gemini 3
google-gemini-3
Imported 2026-05-06
4 OriOn-Qwen-SR1 58.30 Imported 2026-05-06
5 NVIDIA Nemotron 3 Nano Omni 30B A3B 57.60 Nemotron 3 Nano Omni
nvidia-nemotron-3-nano-omni-30b-a3b-reasoning
Imported 2026-05-06
6 Qwen3-VL-235B-A22B-Instruct 57 Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-06
7 Qwen3-VL-235B-A22B-Thinking 56.20 Qwen3 VL 235B A22B Thinking
qwen-qwen3-vl-235b-a22b-thinking
Imported 2026-05-06
8 TeleMM-2.0 56.10 Imported 2026-05-06
9 GLM-4.6V 54.90 GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-06
10 GPT-4.1 2025-04-14 detail high 49.70 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
11 GPT-4o 2024-11-20 detail high 46.30 GPT-4o
openai-gpt-4o
Imported 2026-05-06
12 GLM-4.5V 44.70 GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-06
13 GLM-4.1V-Thinking 42.40 Imported 2026-05-06
14 Kimi-VL-Thinking-2506 42.10 Imported 2026-05-06
15 Qwen2.5-VL-72B 35.20 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-06
16 Kimi-VL-A3B-Instruct 35.10 Imported 2026-05-06
17 MiniMax-VL-01 32.50 Imported 2026-05-06
18 Aria 28.30 Imported 2026-05-06
19 Qwen2.5-VL-7B 25.10 Imported 2026-05-06