MedXpertQA

A comprehensive benchmark to evaluate expert-level medical knowledge and advanced reasoning, featuring 4,460 questions spanning 17 specialties and 11 body systems. Includes both text-only and multimodal subsets with expert-level exam questions incorporating diverse medical images and rich clinical information.

9rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Muse Spark 0.78 Self-reported 2026-05-06
2 Qwen3.5-122B-A10B 0.67 Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Self-reported 2026-05-06
3 Qwen3.5-27B 0.62 Qwen3.5-27B
qwen-qwen3.5-27b
Self-reported 2026-05-06
4 Qwen3.5-35B-A3B 0.61 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Self-reported 2026-05-06
5 Gemma 4 31B 0.61 Gemma 4 31B
google-gemma-4-31b-it
Self-reported 2026-05-06
6 Gemma 4 26B-A4B 0.58 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Self-reported 2026-05-06
7 Gemma 4 E4B 0.29 Self-reported 2026-05-06
8 Gemma 4 E2B 0.23 Self-reported 2026-05-06
9 MedGemma 4B IT 0.19 Self-reported 2026-05-06