DocVQA
DocVQA: Measures visual question answering, OCR, document understanding, chart comprehension, or layout-aware reasoning.
164rows
scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Score, Figure/Diagram, Form, Table/List, Layout, Free_text, Image/Photo, Handwritten, Yes/No, Others
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Performance | 0.9811 | — | Imported | 2026-05-27 |
| 2 | ORCA | 0.9729 | — | Imported | 2026-05-27 |
| 3 | NEXT-8B | 0.9728 | — | Imported | 2026-05-27 |
| 4 | qwen3vl | 0.9725 | — | Imported | 2026-05-27 |
| 5 | Seed-VL-1.5 | 0.9691 | — | Imported | 2026-05-27 |
| 6 | Star_LLM | 0.9671 | — | Imported | 2026-05-27 |
| 7 | qwen2-vl | 0.967 | — | Imported | 2026-05-27 |
| 8 | 1 | 0.9607 | — | Imported | 2026-05-27 |
| 9 | 3 | 0.959 | — | Imported | 2026-05-27 |
| 10 | InternVL2-Pro (generalist) | 0.9506 | — | Imported | 2026-05-27 |
| 11 | MiMo-VL-7B-RL | 0.9501 | — | Imported | 2026-05-27 |
| 12 | VideoLLaMA3-7B | 0.9494 | — | Imported | 2026-05-27 |
| 13 | LLaVA-One-Vision-1.5-8B-Instruct | 0.9484 | — | Imported | 2026-05-27 |
| 14 | Snowflake Arctic-Extract 7B | 0.947 | — | Imported | 2026-05-27 |
| 15 | CATI-VLM-IoT | 0.9448 | — | Imported | 2026-05-27 |
| 16 | 0 | 0.9435 | — | Imported | 2026-05-27 |
| 17 | test | 0.9406 | — | Imported | 2026-05-27 |
| 18 | Molmo-72B | 0.9351 | — | Imported | 2026-05-27 |
| 19 | CCK-KVQwen | 0.9348 | — | Imported | 2026-05-27 |
| 20 | Qwen2.5-3B-lite | 0.9342 | — | Imported | 2026-05-27 |
| 21 | DeepSeek-VL2 | 0.933 | — | Imported | 2026-05-27 |
| 22 | qwenvl-max (single generalist model) | 0.9307 | — | Imported | 2026-05-27 |
| 23 | Master Thesis | 0.9298 | — | Imported | 2026-05-27 |
| 24 | Zamba2-VL-7B | 0.9287 | — | Imported | 2026-05-27 |
| 25 | ZAYA1-VL-8B | 0.9251 | — | Imported | 2026-05-27 |
| 26 | CATI-VLM | 0.9242 | — | Imported | 2026-05-27 |
| 27 | Vary (using multi crop) | 0.9241 | — | Imported | 2026-05-27 |
| 28 | InternVL-1.5-Plus (generalist) | 0.9234 | — | Imported | 2026-05-27 |
| 29 | MLCD-Embodied-7B: Multi-label Cluster Discrimination for Visual Representation Learning | 0.9158 | — | Imported | 2026-05-27 |
| 30 | qwenvl-plus (single generalist model) | 0.9141 | — | Imported | 2026-05-27 |
| 31 | Zamba2-VL-2.7B | 0.9092 | — | Imported | 2026-05-27 |
| 32 | granite-vision-3.3-2b | 0.9087 | — | Imported | 2026-05-27 |
| 33 | SMoLA-PaLI-X Specialist Model | 0.9084 | — | Imported | 2026-05-27 |
| 34 | PP-DocBee-2B | 0.9056 | — | Imported | 2026-05-27 |
| 35 | SMoLA-PaLI-X Generalist Model | 0.9055 | — | Imported | 2026-05-27 |
| 36 | Snowflake Arctic-TILT 0.8B (fine-tuned) | 0.902 | — | Imported | 2026-05-27 |
| 37 | BAIDU-DI | 0.9016 | — | Imported | 2026-05-27 |
| 38 | InternLM-XComposer2-4KHD-7B | 0.9002 | — | Imported | 2026-05-27 |
| 39 | ScreenAI 5B | 0.8988 | — | Imported | 2026-05-27 |
| 40 | Snowflake Arctic-TILT 0.8B (zero-shot) | 0.8881 | — | Imported | 2026-05-27 |
| 41 | Tencent Youtu | 0.8866 | — | Imported | 2026-05-27 |
| 42 | ERNIE-Layout 2.0 | 0.8841 | — | Imported | 2026-05-27 |
| 43 | DocFormerv2 (Single Model with 750M Parameters) | 0.8784 | — | Imported | 2026-05-27 |
| 44 | BlueLM-V-3B | 0.8775 | — | Imported | 2026-05-27 |
| 45 | neetolab-sota-v1 | 0.8759 | — | Imported | 2026-05-27 |
| 46 | Mybank-DocReader | 0.8755 | — | Imported | 2026-05-27 |
| 47 | ERNIE-Layout 1.0 | 0.8753 | — | Imported | 2026-05-27 |
| 48 | Zamba2-VL-1.2B | 0.8743 | — | Imported | 2026-05-27 |
| 49 | Mini-Monkey | 0.8738 | — | Imported | 2026-05-27 |
| 50 | GPT-4 Vision Turbo + Amazon Textract OCR | 0.8736 | — | Imported | 2026-05-27 |
| 51 | Applica.ai TILT | 0.8705 | — | Imported | 2026-05-27 |
| 52 | PaLI-X (Google Research; Single Generative Model) | 0.8679 | — | Imported | 2026-05-27 |
| 53 | LayoutLM 2.0 (single model) | 0.8672 | — | Imported | 2026-05-27 |
| 54 | table-r1_qx | 0.8672 | — | Imported | 2026-05-27 |
| 55 | Qwen3.5_0.8B_test | 0.8662 | — | Imported | 2026-05-27 |
| 56 | 54_nnrc_zephyr | 0.856 | — | Imported | 2026-05-27 |
| 57 | Alibaba DAMO NLP | 0.8506 | — | Imported | 2026-05-27 |
| 58 | PingAn-OneConnect-Gammalab-DQA | 0.8484 | — | Imported | 2026-05-27 |
| 59 | PaliGemma-3B (finetune, 896px) | 0.8477 | — | Imported | 2026-05-27 |
| 60 | Spatial LLM v1.2 | 0.8443 | — | Imported | 2026-05-27 |
| 61 | LayoutLMv2_star_seg_large | 0.843 | — | Imported | 2026-05-27 |
| 62 | Vlm(qwen) | 0.8411 | — | Imported | 2026-05-27 |
| 63 | MoVA-8B (generalist) | 0.8341 | — | Imported | 2026-05-27 |
| 64 | LATIN-Prompt + Claude (Zero shot) | 0.8336 | — | Imported | 2026-05-27 |
| 65 | llama3-qwenvit | 0.8318 | — | Imported | 2026-05-27 |
| 66 | gemma+ocr | 0.8282 | — | Imported | 2026-05-27 |
| 67 | DIVE-Doc (FRD) | 0.8267 | — | Imported | 2026-05-27 |
| 68 | 36_nnrc_llama2 | 0.8239 | — | Imported | 2026-05-27 |
| 69 | Qwen2.5-VL_DocVQA_2409 | 0.823 | — | Imported | 2026-05-27 |
| 70 | nnrc_udop_224_6ds | 0.8227 | — | Imported | 2026-05-27 |
| 71 | loixc-onestage | 0.8221 | — | Imported | 2026-05-27 |
| 72 | loixc-vqa | 0.8127 | — | Imported | 2026-05-27 |
| 73 | Vis(qwen) | 0.8093 | — | Imported | 2026-05-27 |
| 74 | Docugami-Layout | 0.8031 | — | Imported | 2026-05-27 |
| 75 | Vary | 0.7916 | — | Imported | 2026-05-27 |
| 76 | llama | 0.7902 | — | Imported | 2026-05-27 |
| 77 | LayoutLMV2-large on Textract | 0.7873 | — | Imported | 2026-05-27 |
| 78 | LayoutLMv2_star_seg | 0.7859 | — | Imported | 2026-05-27 |
| 79 | PaliGemma-3B (finetune, 448px) | 0.7802 | — | Imported | 2026-05-27 |
| 80 | YoBerDaV2 Single-page | 0.7749 | — | Imported | 2026-05-27 |
| 81 | Structural LM-v2 | 0.7674 | — | Imported | 2026-05-27 |
| 82 | llama3-intern6b | 0.767 | — | Imported | 2026-05-27 |
| 83 | pix2struct-large | 0.7656 | — | Imported | 2026-05-27 |
| 84 | Submission_ErnieLayout_base_finetuned_on_DocVQA_en_train_dev_textract_word_segments_ck-14000 | 0.7599 | — | Imported | 2026-05-27 |
| 85 | Gemma 2b + OCR | 0.7517 | — | Imported | 2026-05-27 |
| 86 | DOLMA_multifinetuning | 0.7458 | — | Imported | 2026-05-27 |
| 87 | instructblip | 0.7429 | — | Imported | 2026-05-27 |
| 88 | Ivy-VL | 0.7417 | — | Imported | 2026-05-27 |
| 89 | Ivy-VL-01 | 0.7417 | — | Imported | 2026-05-27 |
| 90 | QA_Base_MRC_2 | 0.7415 | — | Imported | 2026-05-27 |
| 91 | tixc-vqa | 0.7413 | — | Imported | 2026-05-27 |
| 92 | QA_Base_MRC_1 | 0.7407 | — | Imported | 2026-05-27 |
| 93 | QA_Base_MRC_4 | 0.7348 | — | Imported | 2026-05-27 |
| 94 | QA_Base_MRC_3 | 0.7322 | — | Imported | 2026-05-27 |
| 95 | 0713ap +gpt4o(no v) | 0.7309 | — | Imported | 2026-05-27 |
| 96 | VisFocus-Base | 0.7285 | — | Imported | 2026-05-27 |
| 97 | QA_Base_MRC_5 | 0.7274 | — | Imported | 2026-05-27 |
| 98 | Dolma multifinetuning 7 | 0.7219 | — | Imported | 2026-05-27 |
| 99 | pix2struct-base | 0.7213 | — | Imported | 2026-05-27 |
| 100 | 1010ap +gpt4o(no v) | 0.7201 | — | Imported | 2026-05-27 |
| 101 | MiniCPM-V-2 | 0.7187 | — | Imported | 2026-05-27 |
| 102 | LayoutLM-base+GNN | 0.6984 | — | Imported | 2026-05-27 |
| 103 | Electra Large Squad | 0.6961 | — | Imported | 2026-05-27 |
| 104 | YoBerDaV1 Multi-page | 0.6904 | — | Imported | 2026-05-27 |
| 105 | HyperDQA_V4 | 0.6893 | — | Imported | 2026-05-27 |
| 106 | HyperDQA_V3 | 0.6769 | — | Imported | 2026-05-27 |
| 107 | GPT3.5 | 0.6759 | — | Imported | 2026-05-27 |
| 108 | HyperDQA_V2 | 0.6734 | — | Imported | 2026-05-27 |
| 109 | HyperDQA_V1 | 0.6717 | — | Imported | 2026-05-27 |
| 110 | LATIN-Tuning-Prompt + Alpaca (Zero-shot) | 0.6687 | — | Imported | 2026-05-27 |
| 111 | donut_base | 0.659 | — | Imported | 2026-05-27 |
| 112 | ViTLP | 0.6588 | — | Imported | 2026-05-27 |
| 113 | DocVQA: A Dataset for VQA on Document Images | 0.6566 | — | Imported | 2026-05-27 |
| 114 | BROS_BASE (WebViCoB 6.4M) | 0.6563 | — | Imported | 2026-05-27 |
| 115 | Layoutlm_DocVQA+Token_v2 | 0.6562 | — | Imported | 2026-05-27 |
| 116 | donut_half_input_imageSize | 0.6536 | — | Imported | 2026-05-27 |
| 117 | Bert Large | 0.6447 | — | Imported | 2026-05-27 |
| 118 | Dessurt | 0.6322 | — | Imported | 2026-05-27 |
| 119 | dolma | 0.6196 | — | Imported | 2026-05-27 |
| 120 | Vlm(llama) | 0.5914 | — | Imported | 2026-05-27 |
| 121 | bert fulldata fintuned | 0.59 | — | Imported | 2026-05-27 |
| 122 | bert finetuned | 0.5872 | — | Imported | 2026-05-27 |
| 123 | HyperDQA_V0 | 0.5715 | — | Imported | 2026-05-27 |
| 124 | LayoutLM_Docvqa+Token_v0 | 0.498 | — | Imported | 2026-05-27 |
| 125 | LayoutLMv2, Tesseract OCR eval (dataset OCR trained) | 0.4961 | — | Imported | 2026-05-27 |
| 126 | Vis(llama) | 0.4919 | — | Imported | 2026-05-27 |
| 127 | LayoutLMv2, Tesseract OCR eval (Tesseract OCR trained) | 0.4815 | — | Imported | 2026-05-27 |
| 128 | donut_large_encoderSize_finetuned_20_epoch | 0.4673 | — | Imported | 2026-05-27 |
| 129 | bert | 0.4557 | — | Imported | 2026-05-27 |
| 130 | UGLIFT v0.1 (Clova OCR) | 0.4417 | — | Imported | 2026-05-27 |
| 131 | PaliGemma-3B (finetune, 224px) | 0.4374 | — | Imported | 2026-05-27 |
| 132 | HocrEN(Technique 2) - qwen7b | 0.4282 | — | Imported | 2026-05-27 |
| 133 | HocrEN(Technique 2) - qwen14b | 0.3794 | — | Imported | 2026-05-27 |
| 134 | Finetuning LayoutLMv3_Base | 0.3596 | — | Imported | 2026-05-27 |
| 135 | testtest | 0.3569 | — | Imported | 2026-05-27 |
| 136 | Plain BERT QA | 0.3524 | — | Imported | 2026-05-27 |
| 137 | Clova OCR V0 | 0.3489 | — | Imported | 2026-05-27 |
| 138 | HDNet | 0.3401 | — | Imported | 2026-05-27 |
| 139 | CLOVA OCR | 0.3296 | — | Imported | 2026-05-27 |
| 140 | donut_small_encoderSize_finetuned_20_epoch | 0.3157 | — | Imported | 2026-05-27 |
| 141 | docVQAQV_V0.1 | 0.3016 | — | Imported | 2026-05-27 |
| 142 | HocrEN(Technique 2) - qwen32b | 0.2931 | — | Imported | 2026-05-27 |
| 143 | m-rope2 | 0.2676 | — | Imported | 2026-05-27 |
| 144 | HocrEN(Technique 2) - llama | 0.2488 | — | Imported | 2026-05-27 |
| 145 | dsf | 0.2431 | — | Imported | 2026-05-27 |
| 146 | docVQAQV_V0 | 0.2342 | — | Imported | 2026-05-27 |
| 147 | HocrEN(Technique 2) - mistral | 0.183 | — | Imported | 2026-05-27 |
| 148 | gmini25 | 0.1714 | — | Imported | 2026-05-27 |
| 149 | doubao15 | 0.1585 | — | Imported | 2026-05-27 |
| 150 | claude37 | 0.1584 | — | Imported | 2026-05-27 |
| 151 | gpt4o | 0.1541 | — | Imported | 2026-05-27 |
| 152 | wenxin45 | 0.1477 | — | Imported | 2026-05-27 |
| 153 | seq2seq | 0.1081 | — | Imported | 2026-05-27 |
| 154 | lixiang-vlm-7b-handled | 0.099 | — | Imported | 2026-05-27 |
| 155 | lixiang-vlm-7b | 0.0631 | — | Imported | 2026-05-27 |
| 156 | sg | 0.0603 | — | Imported | 2026-05-27 |
| 157 | dfnb | 0.0595 | — | Imported | 2026-05-27 |
| 158 | clipb | 0.0588 | — | Imported | 2026-05-27 |
| 159 | dfnl | 0.0585 | — | Imported | 2026-05-27 |
| 160 | lixiang-vlm handled | 0.0536 | — | Imported | 2026-05-27 |
| 161 | lixiang-vlm | 0.0264 | — | Imported | 2026-05-27 |
| 162 | table-r1 | 0 | — | Imported | 2026-05-27 |
| 163 | Test Submission | 0 | — | Imported | 2026-05-27 |
| 164 | zs | 0 | — | Imported | 2026-05-27 |
No matching rows.