Data Agent Benchmark

UC Berkeley EPIC Data Lab benchmark for data agents answering complex real-world data tasks across 12 datasets, 9 domains, and multiple database systems.

11rows
pass_at_1primary metric
2026-05-06sampled

Metadata

Metrics

Pass@1, Trials

Latest Results

Overall leaderboard rows from the public DataAgentBench leaderboards.json. Pass@1 ratios are converted to percentages.

Rank Subject Pass@1 Model Match Provenance Sampled
1 Pi Coding Agent + Claude Opus 4.6 56.03 Imported 2026-05-06
2 PromptQL + Gemini 3.1 Pro 54.30 Imported 2026-05-06
3 PromptQL + Claude Opus 4.6 50.80 Imported 2026-05-06
4 Oracle Forge (Tenacious Intelligence) + Claude Sonnet 4.6 45.54 Imported 2026-05-06
5 Claude Opus 4.6 ReAct 43.76 Imported 2026-05-06
6 Gemini-3-Pro ReAct 38 Imported 2026-05-06
7 GPT-5-mini ReAct 30 Imported 2026-05-06
8 GPT-5.2 ReAct 25 Imported 2026-05-06
9 Kimi-K2 ReAct 23 Imported 2026-05-06
10 Oracle Forge (Team Cohere) + Gemini 2.0 Flash 12.80 Imported 2026-05-06
11 Gemini-2.5-Flash ReAct 9 Imported 2026-05-06