Multilingual MGSM (CoT) | BenchmarkList

Metadata

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Llama 3.1 405B Instruct	0.92	—	Self-reported	2026-05-06
2	Llama 3.1 70B Instruct	0.87	Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct	Self-reported	2026-05-06
3	Llama 3.1 8B Instruct	0.69	Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct	Self-reported	2026-05-06