PutnamBench
PutnamBench: Measures mathematical reasoning, symbolic problem solving, proof construction, or competition-style problem solving.
37rows
total_solved_with_solutionsprimary metric
2026-05-27sampled
Metadata
Metrics
Total solved with solutions, Lean solved with solutions, Isabelle solved with solutions, Coq solved with solutions
| Rank | Subject | Total solved with solutions | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Aleph Prover (Logical Intelligence) | 668 solved | — | Imported | 2026-05-27 |
| 2 | Aleph Prover (Logical Intelligence) | 637 solved | — | Imported | 2026-05-27 |
| 3 | Seed-Prover 1.5 (ByteDance) | 581 solved | — | Imported | 2026-05-27 |
| 4 | Aleph Prover (Logical Intelligence) | 500 solved | — | Imported | 2026-05-27 |
| 5 | Hilbert | 462 solved | — | Imported | 2026-05-27 |
| 6 | AxProverBase (Axiomatic AI) | 365 solved | — | Imported | 2026-05-27 |
| 7 | Seed-Prover (ByteDance) | 329 solved | — | Imported | 2026-05-27 |
| 8 | Ax-Prover (Axiomatic AI) | 91 solved | — | Imported | 2026-05-27 |
| 9 | Goedel-Prover-V2 | 86 solved | — | Imported | 2026-05-27 |
| 10 | DeepSeek-Prover-V2 | 47 solved | — | Imported | 2026-05-27 |
| 11 | GPT-5 (ReAct, 10 turns) | 28 solved | — | Imported | 2026-05-27 |
| 12 | DSP+ | 23 solved | — | Imported | 2026-05-27 |
| 13 | Bourbaki | 14 solved | — | Imported | 2026-05-27 |
| 14 | Kimina-Prover-7B-Distill | 10 solved | — | Imported | 2026-05-27 |
| 15 | Self-play Theorem Prover | 8 solved | — | Imported | 2026-05-27 |
| 16 | ABEL | 7 solved | — | Imported | 2026-05-27 |
| 17 | Goedel-Prover-SFT | 7 solved | — | Imported | 2026-05-27 |
| 18 | InternLM2.5-StepProver | 6 solved | — | Imported | 2026-05-27 |
| 19 | DSP (GPT-4o) | 4 solved | — | Imported | 2026-05-27 |
| 20 | InternLM 7B | 4 solved | — | Imported | 2026-05-27 |
| 21 | gemini-2.5-pro-exp-0325 | 3 solved | — | Imported | 2026-05-27 |
| 22 | GPT-4o | 3 solved | — | Imported | 2026-05-27 |
| 23 | Sledgehammer | 3 solved | — | Imported | 2026-05-27 |
| 24 | COPRA (GPT-4o) | 2 solved | — | Imported | 2026-05-27 |
| 25 | o4-mini-high | 2 solved | — | Imported | 2026-05-27 |
| 26 | Deepseek R1 | 1 solved | — | Imported | 2026-05-27 |
| 27 | gemini-2.0-flash-thinking-121 | 1 solved | — | Imported | 2026-05-27 |
| 28 | claude-3.7-sonnet | 0 solved | — | Imported | 2026-05-27 |
| 29 | CoqHammer | 0 solved | — | Imported | 2026-05-27 |
| 30 | DeepSeek-V3-0324 | 0 solved | — | Imported | 2026-05-27 |
| 31 | GPT-4o-mini | 0 solved | — | Imported | 2026-05-27 |
| 32 | Grok-3-mini | 0 solved | — | Imported | 2026-05-27 |
| 33 | o3-mini | 0 solved | — | Imported | 2026-05-27 |
| 34 | ReProver w/ retrieval | 0 solved | — | Imported | 2026-05-27 |
| 35 | ReProver w/o retrieval | 0 solved | — | Imported | 2026-05-27 |
| 36 | Tactician (LSH) | 0 solved | — | Imported | 2026-05-27 |
| 37 | TIR Conjecturor | 0 solved | — | Imported | 2026-05-27 |
No matching rows.