Omni-MATH

Omni-MATH evaluates olympiad-level mathematical reasoning across domains and difficulty bands using a GPT-4o evaluation protocol.

15rows
overall_accprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Acc, Algebra, Precalculus, Calculus, Geometry, Discrete Mathematics, Number Theory, Applied Mathematics, Difficulty: 1-3, Difficulty: 3-5, Difficulty: 5-8, Difficulty: 8-10

Latest Results

Rows are ranked by Overall Acc from the main Omni-MATH leaderboard. Source model display names are preserved without canonical mapping.

Rank Subject Overall Acc Model Match Provenance Sampled
1 OpenAI o1-mini 60.54 Imported 2026-05-06
2 OpenAI o1-preview 52.55 o1-preview
openai-o1-preview
Imported 2026-05-06
3 Qwen2.5-MATH-72b-Instruct 36.20 Imported 2026-05-06
4 Qwen2-MATH-72b-Instruct 33.68 Imported 2026-05-06
5 Qwen2.5-MATH-7b-Instruct 33.22 Imported 2026-05-06
6 GPT-4o 30.49 GPT-4o
openai-gpt-4o
Imported 2026-05-06
7 Qwen2-MATH-7b-Instruct 29.36 Imported 2026-05-06
8 NuminaMATH-72B-COT 28.45 Imported 2026-05-06
9 Claude-3.5-SONNET 26.23 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
10 DeepSeek-Coder-V2 25.78 Imported 2026-05-06
11 MetaLlama-3.1-70B-instruct 24.16 Imported 2026-05-06
12 DeepSeek-Coder-V2-Lite-Instruct 19.73 Imported 2026-05-06
13 Mathstral-7B-v0.1 19.13 Imported 2026-05-06
14 DeepSeekMATH-7b-RL 16.12 Imported 2026-05-06
15 InternLM2-math-plus-mixtral8*22B 14.24 Imported 2026-05-06