VCR

VCR: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.

123rows
q_to_ar_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Q->A accuracy, QA->R accuracy, Q->AR accuracy

Latest Results

Rows are parsed from the official Visual Commonsense Reasoning leaderboard static HTML table. Score is Q->AR accuracy.

Rank Subject Q->AR accuracy Model Match Provenance Sampled
1 Human Performance 85% Imported 2026-05-27
2 GPT4RoI 81.6% GPT-4
openai-gpt-4
Imported 2026-05-27
3 ViP-LLaVa 81.3% Imported 2026-05-27
4 LLME-VCR 75.7% Imported 2026-05-27
5 HunYuan_vcr 75.6% Imported 2026-05-27
6 SP-VCR (ensemble of 4 models) 74.4% Imported 2026-05-27
7 KS-MGSR 74.3% Imported 2026-05-27
8 VLUA+ 74% Imported 2026-05-27
9 VQA-GNN + MerlotReserve-Large (ensemble of 2 models) 74% Imported 2026-05-27
10 VLMT-VCR 72.9% Imported 2026-05-27
11 VL-RoBERTa 72.8% Imported 2026-05-27
12 SP-VCR (single model) 72.2% Imported 2026-05-27
13 VLUA (single model) 72% Imported 2026-05-27
14 🍷MerlotReserve-Large 71.5% Imported 2026-05-27
15 UNIMO+ERNIE(ensemble of 7 models) 71.4% Imported 2026-05-27
16 BLENDER (single model) 70.8% Imported 2026-05-27
17 ERNIE-ViL-large(ensemble of 15 models) 70.5% Imported 2026-05-27
18 EventLens-large 68.5% Imported 2026-05-27
19 MMCNet (ensemble of 4 models) 66.9% Imported 2026-05-27
20 UNITER-large (ensemble of 10 models) 66.8% Imported 2026-05-27
21 RobustNet 66.5% Imported 2026-05-27
22 ERNIE-ViL-large(single model) 66.3% Imported 2026-05-27
23 ADVL (single model, formerly CLIP-TD) 66.2% Imported 2026-05-27
24 DELV 65.8% Imported 2026-05-27
25 VILLA-large (single model) 65.7% Imported 2026-05-27
26 EventLens-large 65.5% Imported 2026-05-27
27 MERLOT (single model) 65.1% Imported 2026-05-27
28 VitsNet 64.6% Imported 2026-05-27
29 gnimix 64.1% Imported 2026-05-27
30 SEITU 63% Imported 2026-05-27
31 UNITER-large (single model) 62.8% Imported 2026-05-27
32 VQA-GNN 62.8% Imported 2026-05-27
33 🍷MerlotReserve-Base 62.6% Imported 2026-05-27
34 VCR-test 62.4% Imported 2026-05-27
35 ERNIE-ViL-base(single model) 62.1% Imported 2026-05-27
36 Kam-net 61.8% Imported 2026-05-27
37 ICAR : Image Compression and Attentional Redundancy for Visual Commonsense Reasoning 61.3% Imported 2026-05-27
38 MMCNet 60.6% Imported 2026-05-27
39 Test_VILLA 60.6% Imported 2026-05-27
40 VILLA_TEST 60.6% Imported 2026-05-27
41 VILLA-base (single model) 60.6% Imported 2026-05-27
42 VILLA_GIST 60.4% Imported 2026-05-27
43 KVL-BERT 60.3% Imported 2026-05-27
44 ViLBERT (ensemble of 10 models) 59.8% Imported 2026-05-27
45 PVL (single model) 59.7% Imported 2026-05-27
46 VL-BERT (single model) 59.7% Imported 2026-05-27
47 VVT 59.7% Imported 2026-05-27
48 SGEITL 59.6% Imported 2026-05-27
49 UNITER_kd 59% Imported 2026-05-27
50 vlt (single model) 58.9% Imported 2026-05-27
51 UNITER_joint 58.8% Imported 2026-05-27
52 ViLBERT (ensemble of 5 models) 58.8% Imported 2026-05-27
53 GITRL 58.7% Imported 2026-05-27
54 PEVL 58.6% Imported 2026-05-27
55 Unicoder-VL (ensemble of 2 models) 58.6% Imported 2026-05-27
56 UNITER_independent 58.6% Imported 2026-05-27
57 UNITER-base (single model) 58.2% Imported 2026-05-27
58 TDN 58% Imported 2026-05-27
59 B2T2 (ensemble of 5 models) 57.1% Imported 2026-05-27
60 BLIP-VCR 56.8% Imported 2026-05-27
61 GPT4 4-shot 56.2% Imported 2026-05-27
62 VL-BERT_kd 55.4% Imported 2026-05-27
63 YTX 55.4% Imported 2026-05-27
64 B2T2 (single model) 55% Imported 2026-05-27
65 Unicoder-VL (single model) 54.9% Imported 2026-05-27
66 ViLBERT (single model) 54.8% Imported 2026-05-27
67 VL-BERT_prec 54.8% Imported 2026-05-27
68 ALBEF 54.7% Imported 2026-05-27
69 ViLBERT_GCN 54.6% Imported 2026-05-27
70 DCGR 54.3% Imported 2026-05-27
71 CARC 54.1% Imported 2026-05-27
72 HGL 53.2% Imported 2026-05-27
73 TNet (ensemble of 5) 53% Imported 2026-05-27
74 CMR 52.8% Imported 2026-05-27
75 TAB-KD 52.4% Imported 2026-05-27
76 VisualBERT 52.4% Imported 2026-05-27
77 SAC 52.2% Imported 2026-05-27
78 GTEHG 51.7% Imported 2026-05-27
79 A3 Net 51.4% Imported 2026-05-27
80 JCL 50.9% Imported 2026-05-27
81 TAB-VCR_attribute 50.8% Imported 2026-05-27
82 MKDN 50.5% Imported 2026-05-27
83 RobustCL 50.5% Imported 2026-05-27
84 TAB-VCR 50.5% Imported 2026-05-27
85 YXY 50.5% Imported 2026-05-27
86 TNet (single model) 50.4% Imported 2026-05-27
87 UABE 50.4% Imported 2026-05-27
88 WWR-Net 50.2% Imported 2026-05-27
89 transformer-r2c 50% Imported 2026-05-27
90 CAR 49.8% Imported 2026-05-27
91 HGL 49.8% Imported 2026-05-27
92 SIA V1 49.8% Imported 2026-05-27
93 CCN-KD 49.7% Imported 2026-05-27
94 SFW1 49.7% Imported 2026-05-27
95 SFW2 49.6% Imported 2026-05-27
96 PUV (Pretrain UNITER by VC feature) 49.3% Imported 2026-05-27
97 RKB 49.3% Imported 2026-05-27
98 BLU 48.9% Imported 2026-05-27
99 vlb (single model) 48.9% Imported 2026-05-27
100 CCD 48.4% Imported 2026-05-27
101 MRCNet 48.4% Imported 2026-05-27
102 MUGRN 47.5% Imported 2026-05-27
103 SGRE 46.9% Imported 2026-05-27
104 R2C-KD 46.6% Imported 2026-05-27
105 FAIR 46.3% Imported 2026-05-27
106 CCN-NKD 46.1% Imported 2026-05-27
107 DAF 46% Imported 2026-05-27
108 CKRE 45.9% Imported 2026-05-27
109 ATGAN 45.5% Imported 2026-05-27
110 MIE 45.5% Imported 2026-05-27
111 emnet 45.4% Imported 2026-05-27
112 Recognition to Cognition Networks 44% Imported 2026-05-27
113 DVD 43.3% Imported 2026-05-27
114 GS Reasoning 41.1% Imported 2026-05-27
115 R2R (text only) 40.5% Imported 2026-05-27
116 R2CC 37.6% Imported 2026-05-27
117 BERT-Base 35% Imported 2026-05-27
118 MUR 25.6% Imported 2026-05-27
119 SG-QA-model 17.7% Imported 2026-05-27
120 MLB 17.2% Imported 2026-05-27
121 BERT-base-vc-ft 14.5% Imported 2026-05-27
122 Visual-Lang-base (ensemble) 13.3% Imported 2026-05-27
123 Random Performance 6.2% Imported 2026-05-27