PolyMath-en

PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and 4 difficulty levels from easy to hard, ensuring difficulty comprehensiveness, language diversity, and high-quality translation. The benchmark evaluates mathematical reasoning capabilities of large language models across diverse linguistic contexts, making it a highly discriminative multilingual mathematical benchmark.

2rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Kimi K2 Instruct 0.65 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Self-reported 2026-05-06
1 Kimi K2-Instruct-0905 0.65 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Self-reported 2026-05-06