ProofNet

ProofNet: Measures mathematical reasoning, symbolic problem solving, proof construction, or competition-style problem solving.

6rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Typecheck Rate

Latest Results

Rows are parsed from the public ProofNet Hugging Face dataset-card leaderboard. The source publishes separate statement autoformalization and informalization rows; BenchmarkList stores the source task in metadata.

Rank Subject Accuracy Model Match Provenance Sampled
1 Code-davinci-002 (in-context learning) (statement informalization) 62.3% Imported 2026-05-27
2 Davinci-code-002 (prompt retrieval) (statement autoformalization) 16.1% Imported 2026-05-27
3 Davinci-code-002 (in-context learning) (statement autoformalization) 13.4% Imported 2026-05-27
4 proofGPT-6.7B (in-context learning) (statement informalization) 6.5% Imported 2026-05-27
5 proofGPT-1.3B (in-context learning) (statement informalization) 4.3% Imported 2026-05-27
6 proofGPT-1.3B (statement autoformalization) 3.2% Imported 2026-05-27