MMMLU | BenchmarkList

Metadata

Score

Showing 2 latest source slices.

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Claude Opus 4.6 Max	90.6%	Claude Opus 4.6 anthropic-claude-opus-4.6	Self-reported	2026-05-28
2	Qwen3.7 Max	90.3%	Qwen3.7 Max qwen-qwen3.7-max	Self-reported	2026-05-28
3	Qwen3.6 Plus	89.5%	Qwen3.6 Plus qwen-qwen3.6-plus	Self-reported	2026-05-28
4	DeepSeek V4 Pro Max	87.9%	DeepSeek V4 Pro deepseek-deepseek-v4-pro	Self-reported	2026-05-28
5	Kimi K2.6 Thinking	87.5%	KIMI MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6	Self-reported	2026-05-28
6	GLM-5.1 Thinking	87.2%	GLM GLM 5.1 z-ai-glm-5.1	Self-reported	2026-05-28
1	Gemini 3.1 Pro Preview	92.6%	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Launch post	2026-04-16
2	Claude Opus 4.7	91.5%	Claude Opus 4.7 anthropic-claude-opus-4.7	Launch post	2026-04-16
3	Claude Opus 4.6	91.1%	Claude Opus 4.6 anthropic-claude-opus-4.6	Launch post	2026-04-16