FunctionalMATH

A functional variant of the MATH benchmark that tests language models' ability to generalize reasoning patterns across different problem instances, revealing the reasoning gap between static and functional performance.

2rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Gemini 1.5 Pro 0.65 Self-reported 2026-05-06
2 Gemini 1.5 Flash 0.54 Self-reported 2026-05-06