MedMCQA
MedMCQA evaluates medical multiple-choice question answering, including settings with and without retrieved/context passages.
11rows
test_accuracyprimary metric
2026-05-06sampled
Metadata
Metrics
Test Accuracy, Dev Accuracy
| Rank | Subject | Test Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Codex 5-shot CoT ( Liévin et al., 2022 ) | 0.60 | — | Imported | 2026-05-06 |
| 2 | VOD BioLinkBERT ( Liévin et al., 2022 ) | 0.58 | — | Imported | 2026-05-06 |
| 3 | InstructGPT zero-shot CoT ( Liévin et al., 2022 ) | 0.49 | — | Imported | 2026-05-06 |
| 4 | PubmedBERT(Gu et al., 2022) | 0.47 | — | Imported | 2026-05-06 |
| 5 | SciBERT (Beltagy et al., 2019) | 0.43 | — | Imported | 2026-05-06 |
| 6 | BioBERT (Lee et al.,2020) | 0.42 | — | Imported | 2026-05-06 |
| 7 | PubmedBERT(Gu et al., 2022) | 0.41 | — | Imported | 2026-05-06 |
| 8 | SciBERT (Beltagy et al., 2019) | 0.39 | — | Imported | 2026-05-06 |
| 9 | BERT (Devlin et al., 2019) Base | 0.37 | — | Imported | 2026-05-06 |
| 10 | BioBERT (Lee et al.,2020) | 0.37 | — | Imported | 2026-05-06 |
| 11 | BERT (Devlin et al., 2019) Base | 0.33 | — | Imported | 2026-05-06 |
No matching rows.