PopQA

PopQA is an entity-centric open-domain question-answering dataset consisting of 14,000 QA pairs designed to evaluate language models' ability to memorize and recall factual knowledge across entities with varying popularity levels. The dataset probes both parametric memory (stored in model parameters) and non-parametric memory effectiveness, with questions covering 16 diverse relationship types from Wikidata converted to natural language using templates. Created by sampling knowledge triples from Wikidata and converting them to natural language questions, focusing on long-tail entities to understand LMs' strengths and limitations in memorizing factual knowledge.

3rows

scoreprimary metric

2026-05-06sampled

Metadata

ID: popqa
Category: General Knowledge
Release: 2022-12-20
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Granite 3.3 8B Base	0.26	—	Self-reported	2026-05-06
1	Granite 3.3 8B Instruct	0.26	—	Self-reported	2026-05-06
3	IBM Granite 4.0 Tiny Preview	0.23	—	Self-reported	2026-05-06

Metadata

Metrics

Latest Results