MedMCQA

MedMCQA evaluates medical multiple-choice question answering, including settings with and without retrieved/context passages.

11rows
test_accuracyprimary metric
2026-05-06sampled

Metadata

Metrics

Test Accuracy, Dev Accuracy

Latest Results

Rows preserve the with-context and without-context source settings. Scores are source accuracies as displayed.

Rank Subject Test Accuracy Model Match Provenance Sampled
1 Codex 5-shot CoT ( Liévin et al., 2022 ) 0.60 Imported 2026-05-06
2 VOD BioLinkBERT ( Liévin et al., 2022 ) 0.58 Imported 2026-05-06
3 InstructGPT zero-shot CoT ( Liévin et al., 2022 ) 0.49 Imported 2026-05-06
4 PubmedBERT(Gu et al., 2022) 0.47 Imported 2026-05-06
5 SciBERT (Beltagy et al., 2019) 0.43 Imported 2026-05-06
6 BioBERT (Lee et al.,2020) 0.42 Imported 2026-05-06
7 PubmedBERT(Gu et al., 2022) 0.41 Imported 2026-05-06
8 SciBERT (Beltagy et al., 2019) 0.39 Imported 2026-05-06
9 BERT (Devlin et al., 2019) Base 0.37 Imported 2026-05-06
10 BioBERT (Lee et al.,2020) 0.37 Imported 2026-05-06
11 BERT (Devlin et al., 2019) Base 0.33 Imported 2026-05-06