TriviaQA
A large-scale reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents (six per question on average) that provide high quality distant supervision for answering the questions. The dataset features relatively complex, compositional questions with considerable syntactic and lexical variability, requiring cross-sentence reasoning to find answers.
17rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Kimi K2 Base | 0.85 | — | Self-reported | 2026-05-06 |
| 2 | Gemma 2 27B | 0.84 | Gemma 2 27B google-gemma-2-27b-it | Self-reported | 2026-05-06 |
| 3 | Mistral Small 3.1 24B Instruct | 0.81 | Mistral: Mistral Small 3.1 24B mistralai-mistral-small-3.1-24b-instruct | Self-reported | 2026-05-06 |
| 3 | Mistral Small 3.1 24B Base | 0.81 | — | Self-reported | 2026-05-06 |
| 5 | Mistral Small 3 24B Base | 0.80 | — | Self-reported | 2026-05-06 |
| 6 | Granite 3.3 8B Base | 0.78 | — | Self-reported | 2026-05-06 |
| 7 | Gemma 2 9B | 0.77 | — | Self-reported | 2026-05-06 |
| 8 | Mistral Large 3 | 0.75 | — | Self-reported | 2026-05-06 |
| 8 | Ministral 3 (14B Base 2512) | 0.75 | — | Self-reported | 2026-05-06 |
| 10 | Mistral NeMo Instruct | 0.74 | Mistral: Mistral Nemo mistralai-mistral-nemo | Self-reported | 2026-05-06 |
| 11 | Gemma 3n E4B Instructed LiteRT Preview | 0.70 | Gemma 3n 4B google-gemma-3n-e4b-it | Self-reported | 2026-05-06 |
| 11 | Gemma 3n E4B | 0.70 | — | Self-reported | 2026-05-06 |
| 13 | Ministral 3 (8B Base 2512) | 0.68 | — | Self-reported | 2026-05-06 |
| 14 | Ministral 8B Instruct | 0.66 | — | Self-reported | 2026-05-06 |
| 15 | Gemma 3n E2B | 0.61 | — | Self-reported | 2026-05-06 |
| 15 | Gemma 3n E2B Instructed LiteRT (Preview) | 0.61 | Gemma 3n 2B google-gemma-3n-e2b-it | Self-reported | 2026-05-06 |
| 17 | Ministral 3 (3B Base 2512) | 0.59 | — | Self-reported | 2026-05-06 |
No matching rows.