GSM8K
Grade-school math word-problem benchmark for evaluating multi-step arithmetic and reasoning performance.
25rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
GSM8K score
| Rank | Subject | GSM8K score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | XiaomiMiMo/MiMo-V2.5-Pro | 99.60 | — | Imported | 2026-05-06 |
| 2 | meta-llama/Llama-3.1-405B | 96.80 | — | Imported | 2026-05-06 |
| 3 | ibm-granite/granite-4.1-30b | 94.16 | — | Imported | 2026-05-06 |
| 4 | deepseek-ai/DeepSeek-V4-Pro | 92.60 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 5 | ibm-granite/granite-4.1-8b | 92.49 | Granite 4.1 8B ibm-granite-granite-4.1-8b | Imported | 2026-05-06 |
| 6 | microsoft/Phi-3-medium-4k-instruct | 91 | — | Imported | 2026-05-06 |
| 7 | prism-ml/Ternary-Bonsai-8B-mlx-2bit | 91 | — | Imported | 2026-05-06 |
| 8 | prism-ml/Ternary-Bonsai-8B-gguf | 91 | — | Imported | 2026-05-06 |
| 9 | prism-ml/Ternary-Bonsai-4B-mlx-2bit | 90.50 | — | Imported | 2026-05-06 |
| 10 | prism-ml/Ternary-Bonsai-4B-gguf | 90.50 | — | Imported | 2026-05-06 |
| 11 | Qwen/Qwen2-72B | 89.50 | — | Imported | 2026-05-06 |
| 12 | deepseek-ai/DeepSeek-V3 | 89.30 | — | Imported | 2026-05-06 |
| 13 | prism-ml/Bonsai-8B-gguf | 88 | — | Imported | 2026-05-06 |
| 14 | prism-ml/Bonsai-8B-mlx-1bit | 88 | — | Imported | 2026-05-06 |
| 15 | prism-ml/Bonsai-4B-gguf | 87.30 | — | Imported | 2026-05-06 |
| 16 | ibm-granite/granite-4.1-3b | 86.88 | — | Imported | 2026-05-06 |
| 17 | microsoft/Phi-3.5-mini-instruct | 86.20 | — | Imported | 2026-05-06 |
| 18 | internlm/internlm2_5-7b-chat | 86 | — | Imported | 2026-05-06 |
| 19 | microsoft/Phi-3-mini-4k-instruct | 85.70 | — | Imported | 2026-05-06 |
| 20 | meta-llama/Llama-3.1-8B-Instruct | 84.50 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
| 21 | Qwen/Qwen2-7B | 79.90 | — | Imported | 2026-05-06 |
| 22 | internlm/internlm2-chat-20b | 79.60 | — | Imported | 2026-05-06 |
| 23 | deepseek-ai/DeepSeek-V2 | 79.20 | — | Imported | 2026-05-06 |
| 24 | prism-ml/Ternary-Bonsai-1.7B-mlx-2bit | 74.20 | — | Imported | 2026-05-06 |
| 25 | prism-ml/Ternary-Bonsai-1.7B-gguf | 74.20 | — | Imported | 2026-05-06 |
No matching rows.