HiddenMath
Google DeepMind's internal mathematical reasoning benchmark that introduces novel problems not encountered during model training to evaluate true mathematical reasoning capabilities rather than memorization
13rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 2.0 Flash | 0.63 | Gemini 2.0 Flash google-gemini-2.0-flash | Self-reported | 2026-05-06 |
| 2 | Gemma 3 27B | 0.60 | Gemma 3 27B google-gemma-3-27b-it | Self-reported | 2026-05-06 |
| 3 | Gemini 2.0 Flash-Lite | 0.55 | Gemini 2.0 Flash Lite google-gemini-2.0-flash-lite-001 | Self-reported | 2026-05-06 |
| 4 | Gemma 3 12B | 0.55 | Gemma 3 12B google-gemma-3-12b-it | Self-reported | 2026-05-06 |
| 5 | Gemini 1.5 Pro | 0.52 | — | Self-reported | 2026-05-06 |
| 6 | Gemini 1.5 Flash | 0.47 | — | Self-reported | 2026-05-06 |
| 7 | Gemma 3 4B | 0.43 | Gemma 3 4B google-gemma-3-4b-it | Self-reported | 2026-05-06 |
| 8 | Gemma 3n E4B Instructed LiteRT Preview | 0.38 | Gemma 3n 4B google-gemma-3n-e4b-it | Self-reported | 2026-05-06 |
| 8 | Gemma 3n E4B Instructed | 0.38 | Gemma 3n 4B google-gemma-3n-e4b-it | Self-reported | 2026-05-06 |
| 10 | Gemini 1.5 Flash 8B | 0.33 | — | Self-reported | 2026-05-06 |
| 11 | Gemma 3n E2B Instructed | 0.28 | Gemma 3n 2B google-gemma-3n-e2b-it | Self-reported | 2026-05-06 |
| 11 | Gemma 3n E2B Instructed LiteRT (Preview) | 0.28 | Gemma 3n 2B google-gemma-3n-e2b-it | Self-reported | 2026-05-06 |
| 13 | Gemma 3 1B | 0.16 | — | Self-reported | 2026-05-06 |
No matching rows.