MMDocBench
Fine-grained multimodal document understanding benchmark with OCR-free VQA, grounding, and document reasoning tasks.
21rows
em_allprimary metric
2026-05-27sampled
Metadata
Metrics
Exact Match All, F1 All, ANLS All, IOU All, IOU@0.5 All, Exact Match VP, F1 VP, ANLS VP, IOU VP, IOU@0.5 VP, Exact Match VR, F1 VR, ANLS VR, IOU VR, IOU@0.5 VR
| Rank | Subject | Exact Match All | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o | 71.99% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 2 | Gemini-1.5-Pro | 71.61% | — | Imported | 2026-05-27 |
| 3 | Qwen-VL-Max | 70.49% | Qwen VL Max qwen-qwen-vl-max | Imported | 2026-05-27 |
| 4 | Claude-3.5-Sonnet | 69.25% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 5 | InternVL2-Llama3-76B | 66.62% | — | Imported | 2026-05-27 |
| 6 | Qwen2.5-VL-7B-Instruct | 65.79% | — | Imported | 2026-05-27 |
| 7 | LLava-OV-Chat-72b | 62.18% | — | Imported | 2026-05-27 |
| 8 | GPT-4V | 61.93% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 9 | Deepseek-VL2 | 59.82% | — | Imported | 2026-05-27 |
| 10 | MiniCPM-V2.6 | 51.32% | — | Imported | 2026-05-27 |
| 11 | Qwen2-VL-7B-Instruct | 45.77% | — | Imported | 2026-05-27 |
| 12 | InternVL2-8B | 43.22% | — | Imported | 2026-05-27 |
| 13 | MiniCPM-Llama3-V2.5 | 34.11% | — | Imported | 2026-05-27 |
| 14 | LLaVA-V1.6-34B | 31.06% | — | Imported | 2026-05-27 |
| 15 | CogVLM2-Chat-19B | 24.86% | — | Imported | 2026-05-27 |
| 16 | TextMonkey | 23.77% | — | Imported | 2026-05-27 |
| 17 | Janus-Pro-7B | 22.67% | — | Imported | 2026-05-27 |
| 18 | mPLUG-Owl3 | 16.17% | — | Imported | 2026-05-27 |
| 19 | mPLUG-DocOwl1.5-Omni | 15.34% | — | Imported | 2026-05-27 |
| 20 | Yi-VL-34B | 10.58% | — | Imported | 2026-05-27 |
| 21 | Ferret | 4.2% | — | Imported | 2026-05-27 |
No matching rows.