SimpleQA
SimpleQA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
23rows
simpleqa_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
SimpleQA
| Rank | Subject | SimpleQA | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4.5-preview-2025-02-27 | 62.5% | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-27 |
| 2 | o3 [^9] [^10] | 49.4% | — | Imported | 2026-05-27 |
| 3 | o3-low [^10] | 49.4% | — | Imported | 2026-05-27 |
| 4 | o3-high [^10] | 48.6% | — | Imported | 2026-05-27 |
| 5 | o1 | 42.6% | o1 openai-o1 | Imported | 2026-05-27 |
| 6 | o1-preview | 42.4% | o1-preview openai-o1-preview | Imported | 2026-05-27 |
| 7 | gpt-4.1-2025-04-14 | 41.6% | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 8 | gpt-4o-2024-08-06 | 40.1% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 9 | gpt-4o-2024-05-13 | 39% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 10 | gpt-4o-2024-11-20 | 38.8% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 11 | [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) | 28.9% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 12 | gpt-4-turbo-2024-04-09 | 24.2% | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-27 |
| 13 | [Claude 3 Opus](https://www.anthropic.com/news/claude-3-family) | 23.5% | — | Imported | 2026-05-27 |
| 14 | o4-mini [^9] [^10] | 20.2% | — | Imported | 2026-05-27 |
| 15 | o4-mini-low [^10] | 20.2% | — | Imported | 2026-05-27 |
| 16 | o4-mini-high [^9] [^10] | 19.3% | — | Imported | 2026-05-27 |
| 17 | gpt-4.1-mini-2025-04-14 | 16.8% | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-27 |
| 18 | o3-mini-high | 13.8% | o3 Mini High openai-o3-mini-high | Imported | 2026-05-27 |
| 19 | o3-mini | 13.4% | o3-mini openai-o3-mini | Imported | 2026-05-27 |
| 20 | o3-mini-low | 13% | — | Imported | 2026-05-27 |
| 21 | gpt-4o-mini-2024-07-18 | 9.5% | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-27 |
| 22 | o1-mini | 7.6% | — | Imported | 2026-05-27 |
| 23 | gpt-4.1-nano-2025-04-14 | 7.6% | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-27 |
No matching rows.