TheoremQA
A theorem-driven question answering dataset containing 800 high-quality questions covering 350+ theorems from Math, Physics, EE&CS, and Finance. Designed to evaluate AI models' capabilities to apply theorems to solve challenging university-level science problems.
6rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen2 72B Instruct | 0.44 | — | Self-reported | 2026-05-06 |
| 2 | Qwen2.5 32B Instruct | 0.44 | — | Self-reported | 2026-05-06 |
| 3 | Qwen2.5-Coder 32B Instruct | 0.43 | Qwen2.5 Coder 32B Instruct qwen-qwen-2.5-coder-32b-instruct | Self-reported | 2026-05-06 |
| 4 | Qwen2.5 14B Instruct | 0.43 | — | Self-reported | 2026-05-06 |
| 5 | Qwen2.5-Coder 7B Instruct | 0.34 | — | Self-reported | 2026-05-06 |
| 6 | Qwen2 7B Instruct | 0.25 | — | Self-reported | 2026-05-06 |
No matching rows.