Omni-MATH
Omni-MATH evaluates olympiad-level mathematical reasoning across domains and difficulty bands using a GPT-4o evaluation protocol.
15rows
overall_accprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Acc, Algebra, Precalculus, Calculus, Geometry, Discrete Mathematics, Number Theory, Applied Mathematics, Difficulty: 1-3, Difficulty: 3-5, Difficulty: 5-8, Difficulty: 8-10
| Rank | Subject | Overall Acc | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | OpenAI o1-mini | 60.54 | — | Imported | 2026-05-06 |
| 2 | OpenAI o1-preview | 52.55 | o1-preview openai-o1-preview | Imported | 2026-05-06 |
| 3 | Qwen2.5-MATH-72b-Instruct | 36.20 | — | Imported | 2026-05-06 |
| 4 | Qwen2-MATH-72b-Instruct | 33.68 | — | Imported | 2026-05-06 |
| 5 | Qwen2.5-MATH-7b-Instruct | 33.22 | — | Imported | 2026-05-06 |
| 6 | GPT-4o | 30.49 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 7 | Qwen2-MATH-7b-Instruct | 29.36 | — | Imported | 2026-05-06 |
| 8 | NuminaMATH-72B-COT | 28.45 | — | Imported | 2026-05-06 |
| 9 | Claude-3.5-SONNET | 26.23 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 10 | DeepSeek-Coder-V2 | 25.78 | — | Imported | 2026-05-06 |
| 11 | MetaLlama-3.1-70B-instruct | 24.16 | — | Imported | 2026-05-06 |
| 12 | DeepSeek-Coder-V2-Lite-Instruct | 19.73 | — | Imported | 2026-05-06 |
| 13 | Mathstral-7B-v0.1 | 19.13 | — | Imported | 2026-05-06 |
| 14 | DeepSeekMATH-7b-RL | 16.12 | — | Imported | 2026-05-06 |
| 15 | InternLM2-math-plus-mixtral8*22B | 14.24 | — | Imported | 2026-05-06 |
No matching rows.