MATH (CoT)
MATH dataset contains 12,500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels (1-5) across seven mathematical subjects. This variant uses Chain-of-Thought prompting to encourage step-by-step reasoning.
6rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Llama 3.1 70B Instruct | 0.68 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Self-reported | 2026-05-06 |
| 2 | Ministral 3 (14B Base 2512) | 0.68 | — | Self-reported | 2026-05-06 |
| 2 | Mistral Large 3 | 0.68 | — | Self-reported | 2026-05-06 |
| 4 | Ministral 3 (8B Base 2512) | 0.63 | — | Self-reported | 2026-05-06 |
| 5 | Ministral 3 (3B Base 2512) | 0.60 | — | Self-reported | 2026-05-06 |
| 6 | Llama 3.1 8B Instruct | 0.52 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Self-reported | 2026-05-06 |
No matching rows.