Multilingual MGSM (CoT)
Multilingual Grade School Math (MGSM) benchmark evaluates language models' chain-of-thought reasoning abilities across ten typologically diverse languages. Contains 250 grade-school math problems manually translated from GSM8K dataset into languages including Bengali and Swahili.
3rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Llama 3.1 405B Instruct | 0.92 | — | Self-reported | 2026-05-06 |
| 2 | Llama 3.1 70B Instruct | 0.87 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Self-reported | 2026-05-06 |
| 3 | Llama 3.1 8B Instruct | 0.69 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Self-reported | 2026-05-06 |
No matching rows.