STaRK

Semi-structured Retrieval Benchmark over textual and relational knowledge bases, covering Amazon product search, academic paper search, and biomedicine inquiries.

13rows
average_mrrprimary metric
2026-05-06sampled

Metadata

Metrics

Average MRR, STARK-AMAZON Hit@1, STARK-AMAZON Hit@5, STARK-AMAZON R@20, STARK-AMAZON MRR, STARK-MAG Hit@1, STARK-MAG Hit@5, STARK-MAG R@20, STARK-MAG MRR, STARK-PRIME Hit@1, STARK-PRIME Hit@5, STARK-PRIME R@20, STARK-PRIME MRR

Latest Results

Rows are parsed from the public STaRK Space app.py human-generated leaderboard dictionary. Score is the mean MRR across STARK-AMAZON, STARK-MAG, and STARK-PRIME.

Rank Subject Average MRR Model Match Provenance Sampled
1 AvaTaR(gpt-4-turbo) 48.51 Imported 2026-05-06
2 Claude3 Reranker 46.81 Imported 2026-05-06
3 GPT4 Reranker 45.51 Imported 2026-05-06
4 GritLM-7b 42.07 Imported 2026-05-06
5 multi-ada-002 41.05 Imported 2026-05-06
6 ada-002 38.27 Imported 2026-05-06
7 voyage-l2-instruct 33.95 Imported 2026-05-06
8 ColBERTv2 33.14 Imported 2026-05-06
9 BM25 28.86 Imported 2026-05-06
10 LLM2Vec 25.15 Imported 2026-05-06
11 ANCE (roberta) 25.06 Imported 2026-05-06
12 QAGNN (roberta) 22.08 Imported 2026-05-06
13 DPR (roberta) 14.05 Imported 2026-05-06