PopQA
PopQA is an entity-centric open-domain question-answering dataset consisting of 14,000 QA pairs designed to evaluate language models' ability to memorize and recall factual knowledge across entities with varying popularity levels. The dataset probes both parametric memory (stored in model parameters) and non-parametric memory effectiveness, with questions covering 16 diverse relationship types from Wikidata converted to natural language using templates. Created by sampling knowledge triples from Wikidata and converting them to natural language questions, focusing on long-tail entities to understand LMs' strengths and limitations in memorizing factual knowledge.
3rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Granite 3.3 8B Base | 0.26 | — | Self-reported | 2026-05-06 |
| 1 | Granite 3.3 8B Instruct | 0.26 | — | Self-reported | 2026-05-06 |
| 3 | IBM Granite 4.0 Tiny Preview | 0.23 | — | Self-reported | 2026-05-06 |
No matching rows.