OmniDocBench 1.5
OmniDocBench 1.5 is a comprehensive benchmark for evaluating multimodal large language models on document understanding tasks, including OCR, document parsing, information extraction, and visual question answering across diverse document types. Lower Overall Edit Distance scores are better.
11rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen3.6 Plus | 0.91 | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-06 |
| 2 | Qwen3.6-35B-A3B | 0.90 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Self-reported | 2026-05-06 |
| 3 | Qwen3.5-122B-A10B | 0.90 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Self-reported | 2026-05-06 |
| 4 | Qwen3.5-35B-A3B | 0.89 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Self-reported | 2026-05-06 |
| 5 | GPT-5.4 | 0.89 | GPT-5.4 openai-gpt-5.4 | Self-reported | 2026-05-06 |
| 6 | Qwen3.5-27B | 0.89 | Qwen3.5-27B qwen-qwen3.5-27b | Self-reported | 2026-05-06 |
| 7 | Kimi K2.5 | 0.89 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Self-reported | 2026-05-06 |
| 8 | GPT-5.4 mini | 0.87 | GPT-5.4 Mini openai-gpt-5.4-mini | Self-reported | 2026-05-06 |
| 9 | GPT-5.4 nano | 0.76 | GPT-5.4 Nano openai-gpt-5.4-nano | Self-reported | 2026-05-06 |
| 10 | Gemini 3 Flash | 0.12 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Self-reported | 2026-05-06 |
| 11 | Gemini 3 Pro | 0.12 | Gemini 3 google-gemini-3 | Self-reported | 2026-05-06 |
No matching rows.