SVAMP
SVAMP: Measures mathematical reasoning, symbolic problem solving, proof construction, or competition-style problem solving.
14rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Transformer RoBERTa (DIV) | 53.3 | — | Imported | 2026-05-27 |
| 2 | Transformer RoBERTa (One-Op) | 40.5 | — | Imported | 2026-05-27 |
| 3 | Transformer RoBERTa (Full Set) | 38.9 | — | Imported | 2026-05-27 |
| 4 | Transformer RoBERTa (SUB) | 37.5 | — | Imported | 2026-05-27 |
| 5 | Transformer RoBERTa (ADD) | 36.3 | — | Imported | 2026-05-27 |
| 6 | Transformer RoBERTa (Two-Op) | 33.9 | — | Imported | 2026-05-27 |
| 7 | Transformer RoBERTa (MUL) | 28.3 | — | Imported | 2026-05-27 |
| 8 | Transformer scratch (ADD) | 22.3 | — | Imported | 2026-05-27 |
| 9 | Transformer scratch (DIV) | 18.6 | — | Imported | 2026-05-27 |
| 10 | Transformer scratch (One-Op) | 18.6 | — | Imported | 2026-05-27 |
| 11 | Transformer scratch (Full Set) | 18.4 | — | Imported | 2026-05-27 |
| 12 | Transformer scratch (MUL) | 17.9 | — | Imported | 2026-05-27 |
| 13 | Transformer scratch (Two-Op) | 17.8 | — | Imported | 2026-05-27 |
| 14 | Transformer scratch (SUB) | 17.1 | — | Imported | 2026-05-27 |
No matching rows.