MMDocBench

Fine-grained multimodal document understanding benchmark with OCR-free VQA, grounding, and document reasoning tasks.

21rows
em_allprimary metric
2026-05-27sampled

Metadata

Metrics

Exact Match All, F1 All, ANLS All, IOU All, IOU@0.5 All, Exact Match VP, F1 VP, ANLS VP, IOU VP, IOU@0.5 VP, Exact Match VR, F1 VR, ANLS VR, IOU VR, IOU@0.5 VR

Latest Results

Rows are parsed from the public MMDocBench project-page static JavaScript score table. VP denotes visual perception and VR denotes visual reasoning.

Rank Subject Exact Match All Model Match Provenance Sampled
1 GPT-4o 71.99% GPT-4o
openai-gpt-4o
Imported 2026-05-27
2 Gemini-1.5-Pro 71.61% Imported 2026-05-27
3 Qwen-VL-Max 70.49% Qwen VL Max
qwen-qwen-vl-max
Imported 2026-05-27
4 Claude-3.5-Sonnet 69.25% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
5 InternVL2-Llama3-76B 66.62% Imported 2026-05-27
6 Qwen2.5-VL-7B-Instruct 65.79% Imported 2026-05-27
7 LLava-OV-Chat-72b 62.18% Imported 2026-05-27
8 GPT-4V 61.93% GPT-4
openai-gpt-4
Imported 2026-05-27
9 Deepseek-VL2 59.82% Imported 2026-05-27
10 MiniCPM-V2.6 51.32% Imported 2026-05-27
11 Qwen2-VL-7B-Instruct 45.77% Imported 2026-05-27
12 InternVL2-8B 43.22% Imported 2026-05-27
13 MiniCPM-Llama3-V2.5 34.11% Imported 2026-05-27
14 LLaVA-V1.6-34B 31.06% Imported 2026-05-27
15 CogVLM2-Chat-19B 24.86% Imported 2026-05-27
16 TextMonkey 23.77% Imported 2026-05-27
17 Janus-Pro-7B 22.67% Imported 2026-05-27
18 mPLUG-Owl3 16.17% Imported 2026-05-27
19 mPLUG-DocOwl1.5-Omni 15.34% Imported 2026-05-27
20 Yi-VL-34B 10.58% Imported 2026-05-27
21 Ferret 4.2% Imported 2026-05-27