MMLU-STEM

STEM-focused subset of the Massive Multitask Language Understanding benchmark, evaluating language models on science, technology, engineering, and mathematics topics including physics, chemistry, mathematics, and other technical subjects.

2rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Qwen2.5 32B Instruct 0.81 Self-reported 2026-05-06
2 Qwen2.5 14B Instruct 0.76 Self-reported 2026-05-06