Data Agent Benchmark
UC Berkeley EPIC Data Lab benchmark for data agents answering complex real-world data tasks across 12 datasets, 9 domains, and multiple database systems.
11rows
pass_at_1primary metric
2026-05-06sampled
Metadata
Metrics
Pass@1, Trials
| Rank | Subject | Pass@1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Pi Coding Agent + Claude Opus 4.6 | 56.03 | — | Imported | 2026-05-06 |
| 2 | PromptQL + Gemini 3.1 Pro | 54.30 | — | Imported | 2026-05-06 |
| 3 | PromptQL + Claude Opus 4.6 | 50.80 | — | Imported | 2026-05-06 |
| 4 | Oracle Forge (Tenacious Intelligence) + Claude Sonnet 4.6 | 45.54 | — | Imported | 2026-05-06 |
| 5 | Claude Opus 4.6 ReAct | 43.76 | — | Imported | 2026-05-06 |
| 6 | Gemini-3-Pro ReAct | 38 | — | Imported | 2026-05-06 |
| 7 | GPT-5-mini ReAct | 30 | — | Imported | 2026-05-06 |
| 8 | GPT-5.2 ReAct | 25 | — | Imported | 2026-05-06 |
| 9 | Kimi-K2 ReAct | 23 | — | Imported | 2026-05-06 |
| 10 | Oracle Forge (Team Cohere) + Gemini 2.0 Flash | 12.80 | — | Imported | 2026-05-06 |
| 11 | Gemini-2.5-Flash ReAct | 9 | — | Imported | 2026-05-06 |
No matching rows.