TriviaQA

A large-scale reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents (six per question on average) that provide high quality distant supervision for answering the questions. The dataset features relatively complex, compositional questions with considerable syntactic and lexical variability, requiring cross-sentence reasoning to find answers.

17rows

scoreprimary metric

2026-05-06sampled

Metadata

ID: triviaqa
Category: General Knowledge
Release: 2017-05-09
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Kimi K2 Base	0.85	—	Self-reported	2026-05-06
2	Gemma 2 27B	0.84	Gemma 2 27B google-gemma-2-27b-it	Self-reported	2026-05-06
3	Mistral Small 3.1 24B Instruct	0.81	Mistral: Mistral Small 3.1 24B mistralai-mistral-small-3.1-24b-instruct	Self-reported	2026-05-06
3	Mistral Small 3.1 24B Base	0.81	—	Self-reported	2026-05-06
5	Mistral Small 3 24B Base	0.80	—	Self-reported	2026-05-06
6	Granite 3.3 8B Base	0.78	—	Self-reported	2026-05-06
7	Gemma 2 9B	0.77	—	Self-reported	2026-05-06
8	Mistral Large 3	0.75	—	Self-reported	2026-05-06
8	Ministral 3 (14B Base 2512)	0.75	—	Self-reported	2026-05-06
10	Mistral NeMo Instruct	0.74	Mistral: Mistral Nemo mistralai-mistral-nemo	Self-reported	2026-05-06
11	Gemma 3n E4B Instructed LiteRT Preview	0.70	Gemma 3n 4B google-gemma-3n-e4b-it	Self-reported	2026-05-06
11	Gemma 3n E4B	0.70	—	Self-reported	2026-05-06
13	Ministral 3 (8B Base 2512)	0.68	—	Self-reported	2026-05-06
14	Ministral 8B Instruct	0.66	—	Self-reported	2026-05-06
15	Gemma 3n E2B	0.61	—	Self-reported	2026-05-06
15	Gemma 3n E2B Instructed LiteRT (Preview)	0.61	Gemma 3n 2B google-gemma-3n-e2b-it	Self-reported	2026-05-06
17	Ministral 3 (3B Base 2512)	0.59	—	Self-reported	2026-05-06

Metadata

Metrics

Latest Results