Data Agent Benchmark | BenchmarkList

Metadata

Pass@1, Trials

Rank	Subject	Pass@1	Model Match	Provenance	Sampled
1	Pi Coding Agent + Claude Opus 4.6	56.03	—	Imported	2026-05-06
2	PromptQL + Gemini 3.1 Pro	54.30	—	Imported	2026-05-06
3	PromptQL + Claude Opus 4.6	50.80	—	Imported	2026-05-06
4	Oracle Forge (Tenacious Intelligence) + Claude Sonnet 4.6	45.54	—	Imported	2026-05-06
5	Claude Opus 4.6 ReAct	43.76	—	Imported	2026-05-06
6	Gemini-3-Pro ReAct	38	—	Imported	2026-05-06
7	GPT-5-mini ReAct	30	—	Imported	2026-05-06
8	GPT-5.2 ReAct	25	—	Imported	2026-05-06
9	Kimi-K2 ReAct	23	—	Imported	2026-05-06
10	Oracle Forge (Team Cohere) + Gemini 2.0 Flash	12.80	—	Imported	2026-05-06
11	Gemini-2.5-Flash ReAct	9	—	Imported	2026-05-06