MultiMedQA
MultiMedQA: Evaluates clinical, biomedical, medical-exam, coding, or healthcare-document reasoning.
5rows
mean_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Mean accuracy across reported MultiMedQA components, MedQA Mainland China, MedQA Taiwan, MedQA United States (5-option), MedQA United States (4-option), PubMedQA Reasoning Required, MedMCQA Dev, MMLU Clinical Knowledge, MMLU Medical Genetics, MMLU Anatomy, MMLU Professional Medicine, MMLU College Biology, MMLU College Medicine
| Rank | Subject | Mean accuracy across reported MultiMedQA components | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4 (5-shot) | 82.405833% | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-27 |
| 2 | GPT-4 (zero-shot) | 81.134167% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | Flan-PaLM 540B (few-shot) | 72.133333% | — | Imported | 2026-05-27 |
| 4 | GPT-3.5 (5-shot) | 59.518333% | — | Imported | 2026-05-27 |
| 5 | GPT-3.5 (zero-shot) | 58.9875% | — | Imported | 2026-05-27 |
No matching rows.