MMMLU
Multilingual Massive Multitask Language Understanding, an MMLU test-set translation across 14 non-English languages for multilingual knowledge and reasoning.
9rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 Max | 90.6% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 2 | Qwen3.7 Max | 90.3% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 3 | Qwen3.6 Plus | 89.5% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 4 | DeepSeek V4 Pro Max | 87.9% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 5 | Kimi K2.6 Thinking | 87.5% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 6 | GLM-5.1 Thinking | 87.2% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 1 | Gemini 3.1 Pro Preview | 92.6% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-16 |
| 2 | Claude Opus 4.7 | 91.5% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-16 |
| 3 | Claude Opus 4.6 | 91.1% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Launch post | 2026-04-16 |
No matching rows.