MMLU-STEM
STEM-focused subset of the Massive Multitask Language Understanding benchmark, evaluating language models on science, technology, engineering, and mathematics topics including physics, chemistry, mathematics, and other technical subjects.
2rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen2.5 32B Instruct | 0.81 | — | Self-reported | 2026-05-06 |
| 2 | Qwen2.5 14B Instruct | 0.76 | — | Self-reported | 2026-05-06 |
No matching rows.