ParseBench
Document parsing benchmark for AI agents over enterprise documents, evaluating tables, charts, content faithfulness, semantic formatting, and visual grounding.
19rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
ParseBench score
| Rank | Subject | ParseBench score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | datalab-to/chandra-ocr-2 | 70.10 | — | Imported | 2026-05-06 |
| 2 | google/gemma-4-31B-it | 62.40 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-06 |
| 3 | google/gemma-4-26B-A4B-it | 58.50 | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-06 |
| 4 | rednote-hilab/dots.mocr | 55.80 | — | Imported | 2026-05-06 |
| 5 | docling-project/docling-models | 50.60 | — | Imported | 2026-05-06 |
| 6 | lightonai/LightOnOCR-2-1B | 48 | — | Imported | 2026-05-06 |
| 7 | Qwen/Qwen3-VL-8B-Instruct | 46.80 | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Imported | 2026-05-06 |
| 8 | baidu/Qianfan-OCR | 46.20 | — | Imported | 2026-05-06 |
| 9 | opendatalab/MinerU2.5-2509-1.2B | 45.90 | — | Imported | 2026-05-06 |
| 10 | Qwen/Qwen3.6-35B-A3B | 44.10 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-06 |
| 11 | deepseek-ai/DeepSeek-OCR-2 | 41.20 | — | Imported | 2026-05-06 |
| 12 | PaddlePaddle/PaddleOCR-VL | 40.90 | — | Imported | 2026-05-06 |
| 13 | google/gemma-4-E4B-it | 40.50 | — | Imported | 2026-05-06 |
| 14 | ibm-granite/granite-vision-4.1-4b | 39.45 | — | Imported | 2026-05-06 |
| 15 | Qwen/Qwen3.5-4B | 35.40 | — | Imported | 2026-05-06 |
| 16 | Qwen/Qwen3.5-9B | 31.90 | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-06 |
| 17 | zai-org/GLM-OCR | 29.60 | — | Imported | 2026-05-06 |
| 18 | Qwen/Qwen3.5-0.8B | 28.40 | — | Imported | 2026-05-06 |
| 19 | Qwen/Qwen3.5-2B | 27.30 | — | Imported | 2026-05-06 |
No matching rows.