ScienceQA
ScienceQA evaluates multimodal science question answering across natural, social, and language science topics with text and image context splits.
84rows
avgprimary metric
2026-05-06sampled
Metadata
Metrics
Natural Science, Social Science, Language Science, Text Context, Image Context, No Context, Grades 1-6, Grades 7-12, Average
| Rank | Subject | Average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Mutimodal-T-SciQ_Large 🥇 | 96.18 | — | Imported | 2026-05-06 |
| 2 | MC-CoT_F-Large 🥈 | 94.88 | — | Imported | 2026-05-06 |
| 3 | Honeybee (Vicuna-13B) 🥉 | 94.39 | — | Imported | 2026-05-06 |
| 4 | Enigma-COT_Large | 94.11 | — | Imported | 2026-05-06 |
| 5 | KAM-CoT | 93.87 | — | Imported | 2026-05-06 |
| 6 | MC-CoT_Large | 93.37 | — | Imported | 2026-05-06 |
| 7 | DPMM-CoT_Large | 93.35 | — | Imported | 2026-05-06 |
| 8 | LLaVA (GPT-4 judge) | 92.53 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 9 | CoMD (Vicuna-7B) | 91.94 | — | Imported | 2026-05-06 |
| 10 | Mutimodal-T-SciQ_Base | 91.75 | — | Imported | 2026-05-06 |
| 11 | Multimodal-CoT_Large | 91.68 | — | Imported | 2026-05-06 |
| 12 | PILL (LLaMA-7B) | 91.23 | — | Imported | 2026-05-06 |
| 13 | LLaVA (ViT-L/16-224) | 91.20 | — | Imported | 2026-05-06 |
| 14 | DPMM-CoT_Base | 90.97 | — | Imported | 2026-05-06 |
| 15 | LLaVA | 90.92 | — | Imported | 2026-05-06 |
| 16 | LaVIN-13B | 90.83 | — | Imported | 2026-05-06 |
| 17 | MC-CoT_F-Base | 90.73 | — | Imported | 2026-05-06 |
| 18 | MC-CoT_Base | 90.64 | — | Imported | 2026-05-06 |
| 19 | LLaMA-SciTune | 90.03 | — | Imported | 2026-05-06 |
| 20 | LaVIN-7B | 89.41 | — | Imported | 2026-05-06 |
| 21 | Flan-T5-XL (LoRA) | 89.29 | — | Imported | 2026-05-06 |
| 22 | Chat-UniVi | 88.78 | — | Imported | 2026-05-06 |
| 23 | Human Performance | 88.40 | — | Imported | 2026-05-06 |
| 24 | DDCoT (T5) | 87.34 | — | Imported | 2026-05-06 |
| 25 | LG-VQA (CLIP) | 87.22 | — | Imported | 2026-05-06 |
| 26 | Chameleon (GPT-4) | 86.54 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 27 | LG-VQA (BLIP-2) | 86.32 | — | Imported | 2026-05-06 |
| 28 | LLaMA-SciTune | 86.11 | — | Imported | 2026-05-06 |
| 29 | Enigma-COT_Base | 85.59 | — | Imported | 2026-05-06 |
| 30 | LLaMA-Adapter | 85.19 | — | Imported | 2026-05-06 |
| 31 | Multimodal-CoT_Base | 84.91 | — | Imported | 2026-05-06 |
| 32 | IMMO SL+RL | 84.80 | — | Imported | 2026-05-06 |
| 33 | CoT GPT-4 | 83.99 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 34 | HoT-T5_Large | 83.38 | — | Imported | 2026-05-06 |
| 35 | HoT-T5_Base | 81.42 | — | Imported | 2026-05-06 |
| 36 | DDCoT (ChatGPT) | 80.15 | — | Imported | 2026-05-06 |
| 37 | Chameleon (ChatGPT) | 79.93 | — | Imported | 2026-05-06 |
| 38 | CoT GPT-3 + Doc | 79.91 | — | Imported | 2026-05-06 |
| 39 | UnifiedQA-T-SciQ_Base | 79.41 | — | Imported | 2026-05-06 |
| 40 | CoT ChatGPT | 78.31 | — | Imported | 2026-05-06 |
| 41 | DDCoT (GPT-3) | 78.09 | — | Imported | 2026-05-06 |
| 42 | LaVIN-13B | 77.54 | — | Imported | 2026-05-06 |
| 43 | CoT GPT-3 (ALE) | 75.17 | — | Imported | 2026-05-06 |
| 44 | LaVIN-7B | 75.11 | — | Imported | 2026-05-06 |
| 45 | CoT GPT-3 (AE) | 74.61 | — | Imported | 2026-05-06 |
| 46 | BLIP-2 | 74.17 | — | Imported | 2026-05-06 |
| 47 | CoT UnifiedQA | 74.11 | — | Imported | 2026-05-06 |
| 48 | GPT-3 (0-shot) | 74.04 | — | Imported | 2026-05-06 |
| 49 | GPT-3 (2-shot) | 73.97 | — | Imported | 2026-05-06 |
| 50 | InstructBLIP | 73.33 | — | Imported | 2026-05-06 |
| 51 | UnifiedQA | 70.12 | — | Imported | 2026-05-06 |
| 52 | ChatGPT | 69.41 | — | Imported | 2026-05-06 |
| 53 | MetaCLIP | 68.77 | — | Imported | 2026-05-06 |
| 54 | OpenCLIP | 67.53 | — | Imported | 2026-05-06 |
| 55 | Flan-T5-XXL | 67.43 | — | Imported | 2026-05-06 |
| 56 | SAM | 67.08 | — | Imported | 2026-05-06 |
| 57 | DINOv2 | 64.60 | — | Imported | 2026-05-06 |
| 58 | VisualBERT | 61.87 | — | Imported | 2026-05-06 |
| 59 | Patch-TRM | 61.42 | — | Imported | 2026-05-06 |
| 60 | ViLT | 61.14 | — | Imported | 2026-05-06 |
| 61 | DFAF | 60.72 | — | Imported | 2026-05-06 |
| 62 | Chat-UniVi | 59.96 | — | Imported | 2026-05-06 |
| 63 | BAN | 59.37 | — | Imported | 2026-05-06 |
| 64 | Top-Down | 59.02 | — | Imported | 2026-05-06 |
| 65 | MiniGPT4 | 58.70 | — | Imported | 2026-05-06 |
| 66 | LLaMA2-13B | 55.78 | — | Imported | 2026-05-06 |
| 67 | DDCoT (MiniGPT-4) | 55.67 | — | Imported | 2026-05-06 |
| 68 | QVix | 55 | — | Imported | 2026-05-06 |
| 69 | MCAN | 54.54 | — | Imported | 2026-05-06 |
| 70 | LLaMA-Adapter-V2 | 54.44 | — | Imported | 2026-05-06 |
| 71 | VLIS | 50.20 | — | Imported | 2026-05-06 |
| 72 | LLaVA-13B | 47.74 | — | Imported | 2026-05-06 |
| 73 | VPGTrans | 47 | — | Imported | 2026-05-06 |
| 74 | MiniGPT-4 | 44.71 | — | Imported | 2026-05-06 |
| 75 | LLaMA1-13B | 43.33 | — | Imported | 2026-05-06 |
| 76 | LLaMA2-7B | 43.08 | — | Imported | 2026-05-06 |
| 77 | LLaVA-7B | 41.10 | — | Imported | 2026-05-06 |
| 78 | Random Chance | 39.83 | — | Imported | 2026-05-06 |
| 79 | OpenFlamingo | 39.27 | — | Imported | 2026-05-06 |
| 80 | Lynx | 38.28 | — | Imported | 2026-05-06 |
| 81 | mPLUG-Owl | 37.93 | — | Imported | 2026-05-06 |
| 82 | MultiGPT | 36.29 | — | Imported | 2026-05-06 |
| 83 | LLaMA1-7B | 36.19 | — | Imported | 2026-05-06 |
| 84 | Fromage | 34.51 | — | Imported | 2026-05-06 |
No matching rows.