SVAMP

SVAMP: Measures mathematical reasoning, symbolic problem solving, proof construction, or competition-style problem solving.

14rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy

Latest Results

Rows are parsed from the SVAMP paper supplementary LaTeX challenge-set table.

Rank Subject Accuracy Model Match Provenance Sampled
1 Transformer RoBERTa (DIV) 53.3 Imported 2026-05-27
2 Transformer RoBERTa (One-Op) 40.5 Imported 2026-05-27
3 Transformer RoBERTa (Full Set) 38.9 Imported 2026-05-27
4 Transformer RoBERTa (SUB) 37.5 Imported 2026-05-27
5 Transformer RoBERTa (ADD) 36.3 Imported 2026-05-27
6 Transformer RoBERTa (Two-Op) 33.9 Imported 2026-05-27
7 Transformer RoBERTa (MUL) 28.3 Imported 2026-05-27
8 Transformer scratch (ADD) 22.3 Imported 2026-05-27
9 Transformer scratch (DIV) 18.6 Imported 2026-05-27
10 Transformer scratch (One-Op) 18.6 Imported 2026-05-27
11 Transformer scratch (Full Set) 18.4 Imported 2026-05-27
12 Transformer scratch (MUL) 17.9 Imported 2026-05-27
13 Transformer scratch (Two-Op) 17.8 Imported 2026-05-27
14 Transformer scratch (SUB) 17.1 Imported 2026-05-27