TriviaQA

A large-scale reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents (six per question on average) that provide high quality distant supervision for answering the questions. The dataset features relatively complex, compositional questions with considerable syntactic and lexical variability, requiring cross-sentence reasoning to find answers.

17rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Kimi K2 Base 0.85 Self-reported 2026-05-06
2 Gemma 2 27B 0.84 Gemma 2 27B
google-gemma-2-27b-it
Self-reported 2026-05-06
3 Mistral Small 3.1 24B Instruct 0.81 Mistral: Mistral Small 3.1 24B
mistralai-mistral-small-3.1-24b-instruct
Self-reported 2026-05-06
3 Mistral Small 3.1 24B Base 0.81 Self-reported 2026-05-06
5 Mistral Small 3 24B Base 0.80 Self-reported 2026-05-06
6 Granite 3.3 8B Base 0.78 Self-reported 2026-05-06
7 Gemma 2 9B 0.77 Self-reported 2026-05-06
8 Mistral Large 3 0.75 Self-reported 2026-05-06
8 Ministral 3 (14B Base 2512) 0.75 Self-reported 2026-05-06
10 Mistral NeMo Instruct 0.74 Mistral: Mistral Nemo
mistralai-mistral-nemo
Self-reported 2026-05-06
11 Gemma 3n E4B Instructed LiteRT Preview 0.70 Gemma 3n 4B
google-gemma-3n-e4b-it
Self-reported 2026-05-06
11 Gemma 3n E4B 0.70 Self-reported 2026-05-06
13 Ministral 3 (8B Base 2512) 0.68 Self-reported 2026-05-06
14 Ministral 8B Instruct 0.66 Self-reported 2026-05-06
15 Gemma 3n E2B 0.61 Self-reported 2026-05-06
15 Gemma 3n E2B Instructed LiteRT (Preview) 0.61 Gemma 3n 2B
google-gemma-3n-e2b-it
Self-reported 2026-05-06
17 Ministral 3 (3B Base 2512) 0.59 Self-reported 2026-05-06