MMLU Professional Medicine

MMLU Professional Medicine: Evaluates clinical, biomedical, medical-exam, coding, or healthcare-document reasoning.

5rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

MMLU Professional Medicine accuracy

Latest Results

Rows are transcribed from public GPT-4 medical challenge problems Table 4 for MMLU Professional Medicine accuracy.

Rank Subject MMLU Professional Medicine accuracy Model Match Provenance Sampled
1 GPT-4 (5-shot) 93.75% GPT-4.5
openai-gpt-4.5-preview
Imported 2026-05-27
2 GPT-4 (zero-shot) 93.01% GPT-4
openai-gpt-4
Imported 2026-05-27
3 Flan-PaLM 540B (few-shot) 83.8% Imported 2026-05-27
4 GPT-3.5 (zero-shot) 70.22% Imported 2026-05-27
5 GPT-3.5 (5-shot) 69.85% Imported 2026-05-27