DROP

DROP (Discrete Reasoning Over Paragraphs) is a reading comprehension benchmark requiring discrete reasoning over paragraph content. It contains crowdsourced, adversarially-created questions that require resolving references and performing discrete operations like addition, counting, or sorting, demanding comprehensive paragraph understanding beyond paraphrase-and-entity-typing shortcuts.

29rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 DeepSeek-V3 0.92 DeepSeek V3
deepseek-deepseek-chat
Self-reported 2026-05-06
2 Claude 3.5 Sonnet 0.87 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Self-reported 2026-05-06
2 Claude 3.5 Sonnet 0.87 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Self-reported 2026-05-06
4 GPT-4 Turbo 0.86 GPT-4 Turbo
openai-gpt-4-turbo
Self-reported 2026-05-06
5 Nova Pro 0.85 Nova Pro 1.0
amazon-nova-pro-v1
Self-reported 2026-05-06
6 Llama 3.1 405B Instruct 0.85 Self-reported 2026-05-06
7 GPT-4o 0.83 GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Self-reported 2026-05-06
8 Claude 3 Opus 0.83 Self-reported 2026-05-06
8 Claude 3.5 Haiku 0.83 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Self-reported 2026-05-06
10 GPT-4 0.81 GPT-4
openai-gpt-4
Self-reported 2026-05-06
11 Nova Lite 0.80 Nova Lite 1.0
amazon-nova-lite-v1
Self-reported 2026-05-06
12 GPT-4o mini 0.80 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Self-reported 2026-05-06
13 Llama 3.1 70B Instruct 0.80 Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Self-reported 2026-05-06
14 Nova Micro 0.79 Nova Micro 1.0
amazon-nova-micro-v1
Self-reported 2026-05-06
15 LongCat-Flash-Chat 0.79 Self-reported 2026-05-06
16 Claude 3 Sonnet 0.79 Self-reported 2026-05-06
17 Claude 3 Haiku 0.78 Claude 3 Haiku
anthropic-claude-3-haiku
Self-reported 2026-05-06
18 Phi 4 0.76 Phi 4
microsoft-phi-4
Self-reported 2026-05-06
19 Gemini 1.5 Pro 0.75 Self-reported 2026-05-06
20 GPT-3.5 Turbo 0.70 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-06
21 Gemma 3n E4B 0.61 Self-reported 2026-05-06
21 Gemma 3n E4B Instructed LiteRT Preview 0.61 Gemma 3n 4B
google-gemma-3n-e4b-it
Self-reported 2026-05-06
23 Llama 3.1 8B Instruct 0.59 Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Self-reported 2026-05-06
24 Granite 3.3 8B Instruct 0.59 Self-reported 2026-05-06
25 Gemma 3n E2B Instructed LiteRT (Preview) 0.54 Gemma 3n 2B
google-gemma-3n-e2b-it
Self-reported 2026-05-06
25 Gemma 3n E2B 0.54 Self-reported 2026-05-06
27 IBM Granite 4.0 Tiny Preview 0.46 Self-reported 2026-05-06
28 Granite 3.3 8B Base 0.36 Self-reported 2026-05-06
29 ERNIE 4.5 0.29 ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Self-reported 2026-05-06