DROP
DROP (Discrete Reasoning Over Paragraphs) is a reading comprehension benchmark requiring discrete reasoning over paragraph content. It contains crowdsourced, adversarially-created questions that require resolving references and performing discrete operations like addition, counting, or sorting, demanding comprehensive paragraph understanding beyond paraphrase-and-entity-typing shortcuts.
29rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | DeepSeek-V3 | 0.92 | DeepSeek V3 deepseek-deepseek-chat | Self-reported | 2026-05-06 |
| 2 | Claude 3.5 Sonnet | 0.87 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Self-reported | 2026-05-06 |
| 2 | Claude 3.5 Sonnet | 0.87 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Self-reported | 2026-05-06 |
| 4 | GPT-4 Turbo | 0.86 | GPT-4 Turbo openai-gpt-4-turbo | Self-reported | 2026-05-06 |
| 5 | Nova Pro | 0.85 | Nova Pro 1.0 amazon-nova-pro-v1 | Self-reported | 2026-05-06 |
| 6 | Llama 3.1 405B Instruct | 0.85 | — | Self-reported | 2026-05-06 |
| 7 | GPT-4o | 0.83 | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Self-reported | 2026-05-06 |
| 8 | Claude 3 Opus | 0.83 | — | Self-reported | 2026-05-06 |
| 8 | Claude 3.5 Haiku | 0.83 | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Self-reported | 2026-05-06 |
| 10 | GPT-4 | 0.81 | GPT-4 openai-gpt-4 | Self-reported | 2026-05-06 |
| 11 | Nova Lite | 0.80 | Nova Lite 1.0 amazon-nova-lite-v1 | Self-reported | 2026-05-06 |
| 12 | GPT-4o mini | 0.80 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Self-reported | 2026-05-06 |
| 13 | Llama 3.1 70B Instruct | 0.80 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Self-reported | 2026-05-06 |
| 14 | Nova Micro | 0.79 | Nova Micro 1.0 amazon-nova-micro-v1 | Self-reported | 2026-05-06 |
| 15 | LongCat-Flash-Chat | 0.79 | — | Self-reported | 2026-05-06 |
| 16 | Claude 3 Sonnet | 0.79 | — | Self-reported | 2026-05-06 |
| 17 | Claude 3 Haiku | 0.78 | Claude 3 Haiku anthropic-claude-3-haiku | Self-reported | 2026-05-06 |
| 18 | Phi 4 | 0.76 | Phi 4 microsoft-phi-4 | Self-reported | 2026-05-06 |
| 19 | Gemini 1.5 Pro | 0.75 | — | Self-reported | 2026-05-06 |
| 20 | GPT-3.5 Turbo | 0.70 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 21 | Gemma 3n E4B | 0.61 | — | Self-reported | 2026-05-06 |
| 21 | Gemma 3n E4B Instructed LiteRT Preview | 0.61 | Gemma 3n 4B google-gemma-3n-e4b-it | Self-reported | 2026-05-06 |
| 23 | Llama 3.1 8B Instruct | 0.59 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Self-reported | 2026-05-06 |
| 24 | Granite 3.3 8B Instruct | 0.59 | — | Self-reported | 2026-05-06 |
| 25 | Gemma 3n E2B Instructed LiteRT (Preview) | 0.54 | Gemma 3n 2B google-gemma-3n-e2b-it | Self-reported | 2026-05-06 |
| 25 | Gemma 3n E2B | 0.54 | — | Self-reported | 2026-05-06 |
| 27 | IBM Granite 4.0 Tiny Preview | 0.46 | — | Self-reported | 2026-05-06 |
| 28 | Granite 3.3 8B Base | 0.36 | — | Self-reported | 2026-05-06 |
| 29 | ERNIE 4.5 | 0.29 | ERNIE 4.5 300B A47B baidu-ernie-4.5-300b-a47b | Self-reported | 2026-05-06 |
No matching rows.