MedQA

MedQA: Evaluates clinical, biomedical, medical-exam, coding, or healthcare-document reasoning.

11rows
usmle_test_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

USMLE test accuracy, USMLE dev accuracy, TWMLE dev accuracy, TWMLE test accuracy

Latest Results

Rows are parsed from the MedQA paper arXiv LaTeX USMLE/TWMLE baseline table.

Rank Subject USMLE test accuracy Model Match Provenance Sampled
1 BioBERT-Large 36.7 Imported 2026-05-27
2 BioRoBERTa-Base 36.1 Imported 2026-05-27
3 IR-Custom 36.1 Imported 2026-05-27
4 IR-ES 35.5 Imported 2026-05-27
5 RoBERTa-Large 35.0 Imported 2026-05-27
6 BERT-Base-En 34.3 Imported 2026-05-27
7 BioBERT-Base 34.1 Imported 2026-05-27
8 clinicalBERT-Base 32.4 Imported 2026-05-27
9 PMI 31.1 Imported 2026-05-27
10 Max-out 28.6 Imported 2026-05-27
11 Chance 25.0 Imported 2026-05-27