VCR
VCR: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.
123rows
q_to_ar_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Q->A accuracy, QA->R accuracy, Q->AR accuracy
| Rank | Subject | Q->AR accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Performance | 85% | — | Imported | 2026-05-27 |
| 2 | GPT4RoI | 81.6% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | ViP-LLaVa | 81.3% | — | Imported | 2026-05-27 |
| 4 | LLME-VCR | 75.7% | — | Imported | 2026-05-27 |
| 5 | HunYuan_vcr | 75.6% | — | Imported | 2026-05-27 |
| 6 | SP-VCR (ensemble of 4 models) | 74.4% | — | Imported | 2026-05-27 |
| 7 | KS-MGSR | 74.3% | — | Imported | 2026-05-27 |
| 8 | VLUA+ | 74% | — | Imported | 2026-05-27 |
| 9 | VQA-GNN + MerlotReserve-Large (ensemble of 2 models) | 74% | — | Imported | 2026-05-27 |
| 10 | VLMT-VCR | 72.9% | — | Imported | 2026-05-27 |
| 11 | VL-RoBERTa | 72.8% | — | Imported | 2026-05-27 |
| 12 | SP-VCR (single model) | 72.2% | — | Imported | 2026-05-27 |
| 13 | VLUA (single model) | 72% | — | Imported | 2026-05-27 |
| 14 | 🍷MerlotReserve-Large | 71.5% | — | Imported | 2026-05-27 |
| 15 | UNIMO+ERNIE(ensemble of 7 models) | 71.4% | — | Imported | 2026-05-27 |
| 16 | BLENDER (single model) | 70.8% | — | Imported | 2026-05-27 |
| 17 | ERNIE-ViL-large(ensemble of 15 models) | 70.5% | — | Imported | 2026-05-27 |
| 18 | EventLens-large | 68.5% | — | Imported | 2026-05-27 |
| 19 | MMCNet (ensemble of 4 models) | 66.9% | — | Imported | 2026-05-27 |
| 20 | UNITER-large (ensemble of 10 models) | 66.8% | — | Imported | 2026-05-27 |
| 21 | RobustNet | 66.5% | — | Imported | 2026-05-27 |
| 22 | ERNIE-ViL-large(single model) | 66.3% | — | Imported | 2026-05-27 |
| 23 | ADVL (single model, formerly CLIP-TD) | 66.2% | — | Imported | 2026-05-27 |
| 24 | DELV | 65.8% | — | Imported | 2026-05-27 |
| 25 | VILLA-large (single model) | 65.7% | — | Imported | 2026-05-27 |
| 26 | EventLens-large | 65.5% | — | Imported | 2026-05-27 |
| 27 | MERLOT (single model) | 65.1% | — | Imported | 2026-05-27 |
| 28 | VitsNet | 64.6% | — | Imported | 2026-05-27 |
| 29 | gnimix | 64.1% | — | Imported | 2026-05-27 |
| 30 | SEITU | 63% | — | Imported | 2026-05-27 |
| 31 | UNITER-large (single model) | 62.8% | — | Imported | 2026-05-27 |
| 32 | VQA-GNN | 62.8% | — | Imported | 2026-05-27 |
| 33 | 🍷MerlotReserve-Base | 62.6% | — | Imported | 2026-05-27 |
| 34 | VCR-test | 62.4% | — | Imported | 2026-05-27 |
| 35 | ERNIE-ViL-base(single model) | 62.1% | — | Imported | 2026-05-27 |
| 36 | Kam-net | 61.8% | — | Imported | 2026-05-27 |
| 37 | ICAR : Image Compression and Attentional Redundancy for Visual Commonsense Reasoning | 61.3% | — | Imported | 2026-05-27 |
| 38 | MMCNet | 60.6% | — | Imported | 2026-05-27 |
| 39 | Test_VILLA | 60.6% | — | Imported | 2026-05-27 |
| 40 | VILLA_TEST | 60.6% | — | Imported | 2026-05-27 |
| 41 | VILLA-base (single model) | 60.6% | — | Imported | 2026-05-27 |
| 42 | VILLA_GIST | 60.4% | — | Imported | 2026-05-27 |
| 43 | KVL-BERT | 60.3% | — | Imported | 2026-05-27 |
| 44 | ViLBERT (ensemble of 10 models) | 59.8% | — | Imported | 2026-05-27 |
| 45 | PVL (single model) | 59.7% | — | Imported | 2026-05-27 |
| 46 | VL-BERT (single model) | 59.7% | — | Imported | 2026-05-27 |
| 47 | VVT | 59.7% | — | Imported | 2026-05-27 |
| 48 | SGEITL | 59.6% | — | Imported | 2026-05-27 |
| 49 | UNITER_kd | 59% | — | Imported | 2026-05-27 |
| 50 | vlt (single model) | 58.9% | — | Imported | 2026-05-27 |
| 51 | UNITER_joint | 58.8% | — | Imported | 2026-05-27 |
| 52 | ViLBERT (ensemble of 5 models) | 58.8% | — | Imported | 2026-05-27 |
| 53 | GITRL | 58.7% | — | Imported | 2026-05-27 |
| 54 | PEVL | 58.6% | — | Imported | 2026-05-27 |
| 55 | Unicoder-VL (ensemble of 2 models) | 58.6% | — | Imported | 2026-05-27 |
| 56 | UNITER_independent | 58.6% | — | Imported | 2026-05-27 |
| 57 | UNITER-base (single model) | 58.2% | — | Imported | 2026-05-27 |
| 58 | TDN | 58% | — | Imported | 2026-05-27 |
| 59 | B2T2 (ensemble of 5 models) | 57.1% | — | Imported | 2026-05-27 |
| 60 | BLIP-VCR | 56.8% | — | Imported | 2026-05-27 |
| 61 | GPT4 4-shot | 56.2% | — | Imported | 2026-05-27 |
| 62 | VL-BERT_kd | 55.4% | — | Imported | 2026-05-27 |
| 63 | YTX | 55.4% | — | Imported | 2026-05-27 |
| 64 | B2T2 (single model) | 55% | — | Imported | 2026-05-27 |
| 65 | Unicoder-VL (single model) | 54.9% | — | Imported | 2026-05-27 |
| 66 | ViLBERT (single model) | 54.8% | — | Imported | 2026-05-27 |
| 67 | VL-BERT_prec | 54.8% | — | Imported | 2026-05-27 |
| 68 | ALBEF | 54.7% | — | Imported | 2026-05-27 |
| 69 | ViLBERT_GCN | 54.6% | — | Imported | 2026-05-27 |
| 70 | DCGR | 54.3% | — | Imported | 2026-05-27 |
| 71 | CARC | 54.1% | — | Imported | 2026-05-27 |
| 72 | HGL | 53.2% | — | Imported | 2026-05-27 |
| 73 | TNet (ensemble of 5) | 53% | — | Imported | 2026-05-27 |
| 74 | CMR | 52.8% | — | Imported | 2026-05-27 |
| 75 | TAB-KD | 52.4% | — | Imported | 2026-05-27 |
| 76 | VisualBERT | 52.4% | — | Imported | 2026-05-27 |
| 77 | SAC | 52.2% | — | Imported | 2026-05-27 |
| 78 | GTEHG | 51.7% | — | Imported | 2026-05-27 |
| 79 | A3 Net | 51.4% | — | Imported | 2026-05-27 |
| 80 | JCL | 50.9% | — | Imported | 2026-05-27 |
| 81 | TAB-VCR_attribute | 50.8% | — | Imported | 2026-05-27 |
| 82 | MKDN | 50.5% | — | Imported | 2026-05-27 |
| 83 | RobustCL | 50.5% | — | Imported | 2026-05-27 |
| 84 | TAB-VCR | 50.5% | — | Imported | 2026-05-27 |
| 85 | YXY | 50.5% | — | Imported | 2026-05-27 |
| 86 | TNet (single model) | 50.4% | — | Imported | 2026-05-27 |
| 87 | UABE | 50.4% | — | Imported | 2026-05-27 |
| 88 | WWR-Net | 50.2% | — | Imported | 2026-05-27 |
| 89 | transformer-r2c | 50% | — | Imported | 2026-05-27 |
| 90 | CAR | 49.8% | — | Imported | 2026-05-27 |
| 91 | HGL | 49.8% | — | Imported | 2026-05-27 |
| 92 | SIA V1 | 49.8% | — | Imported | 2026-05-27 |
| 93 | CCN-KD | 49.7% | — | Imported | 2026-05-27 |
| 94 | SFW1 | 49.7% | — | Imported | 2026-05-27 |
| 95 | SFW2 | 49.6% | — | Imported | 2026-05-27 |
| 96 | PUV (Pretrain UNITER by VC feature) | 49.3% | — | Imported | 2026-05-27 |
| 97 | RKB | 49.3% | — | Imported | 2026-05-27 |
| 98 | BLU | 48.9% | — | Imported | 2026-05-27 |
| 99 | vlb (single model) | 48.9% | — | Imported | 2026-05-27 |
| 100 | CCD | 48.4% | — | Imported | 2026-05-27 |
| 101 | MRCNet | 48.4% | — | Imported | 2026-05-27 |
| 102 | MUGRN | 47.5% | — | Imported | 2026-05-27 |
| 103 | SGRE | 46.9% | — | Imported | 2026-05-27 |
| 104 | R2C-KD | 46.6% | — | Imported | 2026-05-27 |
| 105 | FAIR | 46.3% | — | Imported | 2026-05-27 |
| 106 | CCN-NKD | 46.1% | — | Imported | 2026-05-27 |
| 107 | DAF | 46% | — | Imported | 2026-05-27 |
| 108 | CKRE | 45.9% | — | Imported | 2026-05-27 |
| 109 | ATGAN | 45.5% | — | Imported | 2026-05-27 |
| 110 | MIE | 45.5% | — | Imported | 2026-05-27 |
| 111 | emnet | 45.4% | — | Imported | 2026-05-27 |
| 112 | Recognition to Cognition Networks | 44% | — | Imported | 2026-05-27 |
| 113 | DVD | 43.3% | — | Imported | 2026-05-27 |
| 114 | GS Reasoning | 41.1% | — | Imported | 2026-05-27 |
| 115 | R2R (text only) | 40.5% | — | Imported | 2026-05-27 |
| 116 | R2CC | 37.6% | — | Imported | 2026-05-27 |
| 117 | BERT-Base | 35% | — | Imported | 2026-05-27 |
| 118 | MUR | 25.6% | — | Imported | 2026-05-27 |
| 119 | SG-QA-model | 17.7% | — | Imported | 2026-05-27 |
| 120 | MLB | 17.2% | — | Imported | 2026-05-27 |
| 121 | BERT-base-vc-ft | 14.5% | — | Imported | 2026-05-27 |
| 122 | Visual-Lang-base (ensemble) | 13.3% | — | Imported | 2026-05-27 |
| 123 | Random Performance | 6.2% | — | Imported | 2026-05-27 |
No matching rows.