SuperGPQA
Graduate-level knowledge and reasoning benchmark covering 285 disciplines with expert-filtered multiple-choice questions.
21rows
overall_sampleprimary metric
2026-05-28sampled
Metadata
Metrics
Overall (sample), Overall (subfield), Overall (field), Overall (discipline), Easy (sample), Middle (sample), Hard (sample)
Showing 2 latest source slices.
| Rank | Subject | Overall (sample) | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen3.7 Max | 73.6% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.6 Max | 72.5% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 3 | Qwen3.6 Plus | 71.6% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 4 | Kimi K2.6 Thinking | 71.3% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 5 | DeepSeek V4 Pro Max | 69.9% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 6 | GLM-5.1 Thinking | 68% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 1 | DeepSeek-R1 | 61.82 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 2 | o1-2024-12-17 | 60.24 | o1 openai-o1 | Imported | 2026-05-06 |
| 3 | DeepSeek-R1-Zero | 60.24 | — | Imported | 2026-05-06 |
| 4 | o3-mini-2025-01-31-high | 55.22 | o3 Mini High openai-o3-mini-high | Imported | 2026-05-06 |
| 5 | Doubao-1.5-pro-32k-250115 | 55.09 | — | Imported | 2026-05-06 |
| 6 | o3-mini-2025-01-31-medium | 52.69 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 7 | Doubao-1.5-pro-32k-241225 | 50.93 | — | Imported | 2026-05-06 |
| 8 | Qwen-max-2025-01-25 | 50.08 | Qwen-Max qwen-qwen-max | Imported | 2026-05-06 |
| 9 | Claude-3-5-sonnet-20241022 | 48.16 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 10 | Gemini-2.0-flash | 47.73 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-06 |
| 11 | Qwen2.5-72B | 34.33 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-06 |
| 12 | Qwen2.5-32B | 33.16 | — | Imported | 2026-05-06 |
| 13 | DeepSeek-V3-Base | 32.14 | — | Imported | 2026-05-06 |
| 14 | Qwen2.5-14B | 30.19 | — | Imported | 2026-05-06 |
| 15 | Yi-1.5-34B | 27.62 | — | Imported | 2026-05-06 |
No matching rows.