ScienceQA

ScienceQA evaluates multimodal science question answering across natural, social, and language science topics with text and image context splits.

84rows
avgprimary metric
2026-05-06sampled

Metadata

Metrics

Natural Science, Social Science, Language Science, Text Context, Image Context, No Context, Grades 1-6, Grades 7-12, Average

Latest Results

Rows are ranked by Avg. Source display names and method/learning labels are preserved without canonical model mapping.

Rank Subject Average Model Match Provenance Sampled
1 Mutimodal-T-SciQ_Large 🥇 96.18 — Imported 2026-05-06
2 MC-CoT_F-Large 🥈 94.88 — Imported 2026-05-06
3 Honeybee (Vicuna-13B) 🥉 94.39 — Imported 2026-05-06
4 Enigma-COT_Large 94.11 — Imported 2026-05-06
5 KAM-CoT 93.87 — Imported 2026-05-06
6 MC-CoT_Large 93.37 — Imported 2026-05-06
7 DPMM-CoT_Large 93.35 — Imported 2026-05-06
8 LLaVA (GPT-4 judge) 92.53 GPT-4
openai-gpt-4
Imported 2026-05-06
9 CoMD (Vicuna-7B) 91.94 — Imported 2026-05-06
10 Mutimodal-T-SciQ_Base 91.75 — Imported 2026-05-06
11 Multimodal-CoT_Large 91.68 — Imported 2026-05-06
12 PILL (LLaMA-7B) 91.23 — Imported 2026-05-06
13 LLaVA (ViT-L/16-224) 91.20 — Imported 2026-05-06
14 DPMM-CoT_Base 90.97 — Imported 2026-05-06
15 LLaVA 90.92 — Imported 2026-05-06
16 LaVIN-13B 90.83 — Imported 2026-05-06
17 MC-CoT_F-Base 90.73 — Imported 2026-05-06
18 MC-CoT_Base 90.64 — Imported 2026-05-06
19 LLaMA-SciTune 90.03 — Imported 2026-05-06
20 LaVIN-7B 89.41 — Imported 2026-05-06
21 Flan-T5-XL (LoRA) 89.29 — Imported 2026-05-06
22 Chat-UniVi 88.78 — Imported 2026-05-06
23 Human Performance 88.40 — Imported 2026-05-06
24 DDCoT (T5) 87.34 — Imported 2026-05-06
25 LG-VQA (CLIP) 87.22 — Imported 2026-05-06
26 Chameleon (GPT-4) 86.54 GPT-4
openai-gpt-4
Imported 2026-05-06
27 LG-VQA (BLIP-2) 86.32 — Imported 2026-05-06
28 LLaMA-SciTune 86.11 — Imported 2026-05-06
29 Enigma-COT_Base 85.59 — Imported 2026-05-06
30 LLaMA-Adapter 85.19 — Imported 2026-05-06
31 Multimodal-CoT_Base 84.91 — Imported 2026-05-06
32 IMMO SL+RL 84.80 — Imported 2026-05-06
33 CoT GPT-4 83.99 GPT-4
openai-gpt-4
Imported 2026-05-06
34 HoT-T5_Large 83.38 — Imported 2026-05-06
35 HoT-T5_Base 81.42 — Imported 2026-05-06
36 DDCoT (ChatGPT) 80.15 — Imported 2026-05-06
37 Chameleon (ChatGPT) 79.93 — Imported 2026-05-06
38 CoT GPT-3 + Doc 79.91 — Imported 2026-05-06
39 UnifiedQA-T-SciQ_Base 79.41 — Imported 2026-05-06
40 CoT ChatGPT 78.31 — Imported 2026-05-06
41 DDCoT (GPT-3) 78.09 — Imported 2026-05-06
42 LaVIN-13B 77.54 — Imported 2026-05-06
43 CoT GPT-3 (ALE) 75.17 — Imported 2026-05-06
44 LaVIN-7B 75.11 — Imported 2026-05-06
45 CoT GPT-3 (AE) 74.61 — Imported 2026-05-06
46 BLIP-2 74.17 — Imported 2026-05-06
47 CoT UnifiedQA 74.11 — Imported 2026-05-06
48 GPT-3 (0-shot) 74.04 — Imported 2026-05-06
49 GPT-3 (2-shot) 73.97 — Imported 2026-05-06
50 InstructBLIP 73.33 — Imported 2026-05-06
51 UnifiedQA 70.12 — Imported 2026-05-06
52 ChatGPT 69.41 — Imported 2026-05-06
53 MetaCLIP 68.77 — Imported 2026-05-06
54 OpenCLIP 67.53 — Imported 2026-05-06
55 Flan-T5-XXL 67.43 — Imported 2026-05-06
56 SAM 67.08 — Imported 2026-05-06
57 DINOv2 64.60 — Imported 2026-05-06
58 VisualBERT 61.87 — Imported 2026-05-06
59 Patch-TRM 61.42 — Imported 2026-05-06
60 ViLT 61.14 — Imported 2026-05-06
61 DFAF 60.72 — Imported 2026-05-06
62 Chat-UniVi 59.96 — Imported 2026-05-06
63 BAN 59.37 — Imported 2026-05-06
64 Top-Down 59.02 — Imported 2026-05-06
65 MiniGPT4 58.70 — Imported 2026-05-06
66 LLaMA2-13B 55.78 — Imported 2026-05-06
67 DDCoT (MiniGPT-4) 55.67 — Imported 2026-05-06
68 QVix 55 — Imported 2026-05-06
69 MCAN 54.54 — Imported 2026-05-06
70 LLaMA-Adapter-V2 54.44 — Imported 2026-05-06
71 VLIS 50.20 — Imported 2026-05-06
72 LLaVA-13B 47.74 — Imported 2026-05-06
73 VPGTrans 47 — Imported 2026-05-06
74 MiniGPT-4 44.71 — Imported 2026-05-06
75 LLaMA1-13B 43.33 — Imported 2026-05-06
76 LLaMA2-7B 43.08 — Imported 2026-05-06
77 LLaVA-7B 41.10 — Imported 2026-05-06
78 Random Chance 39.83 — Imported 2026-05-06
79 OpenFlamingo 39.27 — Imported 2026-05-06
80 Lynx 38.28 — Imported 2026-05-06
81 mPLUG-Owl 37.93 — Imported 2026-05-06
82 MultiGPT 36.29 — Imported 2026-05-06
83 LLaMA1-7B 36.19 — Imported 2026-05-06
84 Fromage 34.51 — Imported 2026-05-06