DABstep
Data Agent Benchmark for Multi-step Reasoning evaluates data-analysis agents on real-world, multi-step tasks over structured and unstructured business data.
100rows
hard_accuracyprimary metric
2026-05-06sampled
Metadata
Metrics
Hard Level Accuracy, Easy Level Accuracy
| Rank | Subject | Hard Level Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Genesis Computing | user jaymiller96734 - Genesis Data Agent / gpt5-2 | 100 | — | Imported | 2026-05-06 |
| 2 | OceanBase | user ConfuseG - DataPilot / Qwen3 | 100 | — | Imported | 2026-05-06 |
| 3 | Think Evolve Labs LLC | user skylord - ThinkEvolve Spoofer / MiniMax Agent | 99.21 | — | Imported | 2026-05-06 |
| 4 | personal - q_test_da_pass8_simple | 99.21 | — | Imported | 2026-05-06 |
| 5 | personal - Clalude code and gemini | 98.94 | — | Imported | 2026-05-06 |
| 6 | personal - Claude code and Gemini | 98.94 | — | Imported | 2026-05-06 |
| 7 | personal - ConfuseAgent | 98.68 | — | Imported | 2026-05-06 |
| 8 | aa | user fatihozturk - da_1775991891_rnd / aaa | 95.77 | — | Imported | 2026-05-06 |
| 9 | OceanBase AntGroup | user ConfuseG - DataPilot / Qwen3 | 95.50 | — | Imported | 2026-05-06 |
| 10 | Glitchcraft Inc | user Navgameramp - Actioneer v0.5 Agent / Claude Opus 4.6 | 94.44 | — | Imported | 2026-05-06 |
| 11 | llmtech | user daxiongshu - llmtech-data-explorer-2-19-pp / claude haiku 4.5 | 94.18 | — | Imported | 2026-05-06 |
| 12 | aa | user fatihozturk - da_1775608274_rnd / aa | 93.12 | — | Imported | 2026-05-06 |
| 13 | llmtech | user daxiongshu - llmtech-data-explorer-nat / claude haiku | 92.86 | — | Imported | 2026-05-06 |
| 14 | aa | user fatihozturk - da_1775991891_lo / aa | 91.53 | — | Imported | 2026-05-06 |
| 15 | llmtech | user daxiongshu - llmtech-ds-agent / claude haiku 4.5 | 90.74 | — | Imported | 2026-05-06 |
| 16 | NVIDIA LLM-tech Agent Research - NVIDIA KGMON (NeMo Agent Toolkit) Data Explorer / claude haiku 4.5 | 89.95 | — | Imported | 2026-05-06 |
| 17 | Ant Digital Technologies | user stcezhou - MLE_Agent_V1.3 / claude-4.5-opus | 89.42 | — | Imported | 2026-05-06 |
| 18 | Ant Digital Technologies | user stcezhou - MLE_Agent_V1.3.2 / claude-4.5-opus | 89.15 | — | Imported | 2026-05-06 |
| 19 | Ant Digital Technologies | user stcezhou - MLE_Agent_V1.3.1 / claude-4.5-opus | 89.15 | — | Imported | 2026-05-06 |
| 20 | getdot.ai | user zurfer - GetDot 0.3 / gpt + claude | 88.62 | — | Imported | 2026-05-06 |
| 21 | Zoom | user zg77 - ZDA_V6 / gpt-oss-120b | 88.62 | — | Imported | 2026-05-06 |
| 22 | OceanBase AntGroup | user ConfuseG - DataPilot / Qwen3 | 87.57 | — | Imported | 2026-05-06 |
| 23 | llmtech | user daxiongshu - llmtech-data-explorer-nat-2-18 / claude haiku 4.5 | 87.30 | — | Imported | 2026-05-06 |
| 24 | Glitchcraft Inc | user Navgameramp - Actioneer v0.5 / Claude Opus 4.6 | 85.98 | — | Imported | 2026-05-06 |
| 25 | null - DataPilot | 85.71 | — | Imported | 2026-05-06 |
| 26 | AntGroup | user stcezhou - MLE_Agent_V1.2 / Claude | 85.71 | — | Imported | 2026-05-06 |
| 27 | test00001 - test00001 | 85.71 | — | Imported | 2026-05-06 |
| 28 | aa | user fatihozturk - da_1775991891 / aaa | 85.45 | — | Imported | 2026-05-06 |
| 29 | aa | user fatihozturk - da_1775991891_mltplr / aa | 85.45 | — | Imported | 2026-05-06 |
| 30 | getdot.ai | user zurfer - GetDot 0.2 / gpt + claude | 85.45 | — | Imported | 2026-05-06 |
| 31 | llmtech | user daxiongshu - llmtech-ds-agent-no-pp / claude haiku 4.5 | 83.07 | — | Imported | 2026-05-06 |
| 32 | aa | user fatihozturk - da_1775608274 / aa | 82.80 | — | Imported | 2026-05-06 |
| 33 | Zoom | user zg77 - ZDA-V5 / gpt-oss-120b | 82.80 | — | Imported | 2026-05-06 |
| 34 | mooz | user pinkknip - mooz_v5 / gpt-oss-120b | 82.80 | — | Imported | 2026-05-06 |
| 35 | antgroup | user stcezhou - qiyu_3 / qwen-max | 80.95 | — | Imported | 2026-05-06 |
| 36 | antgroup | user stcezhou - qiyu1227_2 / claude | 80.69 | — | Imported | 2026-05-06 |
| 37 | NA | user mjeblicknvidia - ds_agent / nemotron | 80.42 | — | Imported | 2026-05-06 |
| 38 | AntGroup - MLE_AGENT_V1.1.1 | 80.16 | — | Imported | 2026-05-06 |
| 39 | getdot.ai | user zurfer - GetDot / gpt + claude | 79.89 | — | Imported | 2026-05-06 |
| 40 | test - KGv1 DKv1.6 | 79.89 | — | Imported | 2026-05-06 |
| 41 | AntGroup | user DechowWen - MLE_AGENT_V1.1 / Claude | 79.10 | — | Imported | 2026-05-06 |
| 42 | test - v1.6 | 78.57 | — | Imported | 2026-05-06 |
| 43 | antgroup | user stcezhou - qiyu_1229_v2 / claude | 78.57 | — | Imported | 2026-05-06 |
| 44 | AI - test-Agent_test-2 | 78.31 | — | Imported | 2026-05-06 |
| 45 | AI_test - Agent_test-3 | 78.31 | — | Imported | 2026-05-06 |
| 46 | aa | user fatihozturk - da_1776848093_4 / aa | 77.51 | — | Imported | 2026-05-06 |
| 47 | NA | user mjeblicknvidia - Data explorer agent / nt | 77.25 | — | Imported | 2026-05-06 |
| 48 | test - kg04r-dkv1.6 | 76.72 | — | Imported | 2026-05-06 |
| 49 | test - DK1.6-KGv04 | 75.13 | — | Imported | 2026-05-06 |
| 50 | test - V1.7-fixed | 74.60 | — | Imported | 2026-05-06 |
| 51 | antgroup | user stcezhou - qiyu1227 / qwen | 74.60 | — | Imported | 2026-05-06 |
| 52 | llmtech | user daxiongshu - llmtech-data-explorer-haiku-4-5-pp / claude-haiku-4-5 | 71.43 | — | Imported | 2026-05-06 |
| 53 | test - v1.7 | 71.16 | — | Imported | 2026-05-06 |
| 54 | null - test-AgenticData | 70.63 | — | Imported | 2026-05-06 |
| 55 | agent_test | user IncredibleMe - agent_test / claude | 70.37 | — | Imported | 2026-05-06 |
| 56 | AI_test - Agent_test-1 | 69.84 | — | Imported | 2026-05-06 |
| 57 | gg-org | user geo11 - gg-agent-gpt5-workspace-1210-thresh8 / gg-family | 69.58 | — | Imported | 2026-05-06 |
| 58 | gg-org | user geo11 - gg-agent-gpt5-workspace-1210-thresh7 / gg-family | 69.31 | — | Imported | 2026-05-06 |
| 59 | DataCloud - Powerdrill Agents Team | 67.99 | — | Imported | 2026-05-06 |
| 60 | dsvx | user billxu0424 - test_agent / qwen_plus | 67.99 | — | Imported | 2026-05-06 |
| 61 | gg-org | user geo11 - gg-agent-gpt-mini-1217-1 / gg-family | 67.72 | — | Imported | 2026-05-06 |
| 62 | gg-org | user geo11 - gg-agent-gpt-mini-1217 / gg-family | 67.72 | — | Imported | 2026-05-06 |
| 63 | qiyu - qiyu1224 | 67.46 | — | Imported | 2026-05-06 |
| 64 | llmtech | user daxiongshu - ds-agent-test-123 / claude | 66.93 | — | Imported | 2026-05-06 |
| 65 | Alibaba Cloud | user DataAgentForAnalytics - Data Agent for Analytics_v0.6.1 / Qwen3 | 65.34 | — | Imported | 2026-05-06 |
| 66 | gg-org | user geo11 - gg-agent-gpt5-workspace-1210-thresh9 / gg-family | 65.34 | — | Imported | 2026-05-06 |
| 67 | antgroup | user stcezhou - qiyu_1229 / claude | 65.34 | — | Imported | 2026-05-06 |
| 68 | antgroup | user stcezhou - qiyu_4 / qwen3 | 65.34 | — | Imported | 2026-05-06 |
| 69 | test - org-Test Agent | 64.81 | — | Imported | 2026-05-06 |
| 70 | test - org-Test Agent2 | 64.81 | — | Imported | 2026-05-06 |
| 71 | Alibaba Cloud | user DataAgentForAnalytics - Data Agent for Analytics_v0.1_1 / Qwen3 | 64.55 | — | Imported | 2026-05-06 |
| 72 | llmtech | user daxiongshu - llmtech-data-explorer-haiku-4-5.jsonl / claude-haiku-4-5 | 64.55 | — | Imported | 2026-05-06 |
| 73 | DataCloud - Powerdrill Agents Team-0922 | 64.29 | — | Imported | 2026-05-06 |
| 74 | Alibaba Cloud | user DataAgentForAnalytics - Data Agent for Analytics_0917_1 / Qwen3 | 62.96 | — | Imported | 2026-05-06 |
| 75 | Alibaba Cloud | user DataAgentForAnalytics - Data Agent for Analytics_v0.1 / Qwen3 | 62.96 | — | Imported | 2026-05-06 |
| 76 | gg-org | user geo11 - gg-agent-gpt5-1104-1 / [Doubao](https://artificialanalysis.ai/models/doubao-seed-code) | 62.96 | — | Imported | 2026-05-06 |
| 77 | gg-org | user geo11 - gg-agent-db-mini-1217-2 / gg-family | 62.70 | — | Imported | 2026-05-06 |
| 78 | gg-org | user geo11 - gg-agent-db-1217-1 / gg-family | 62.70 | — | Imported | 2026-05-06 |
| 79 | gg-org | user geo11 - gg-agent-gpt5-no-workspace-1206 / gg-family | 61.90 | — | Imported | 2026-05-06 |
| 80 | Sphinx | user rkodialam - sphinx-0.8 / Sphinx | 61.38 | — | Imported | 2026-05-06 |
| 81 | uh - uh | 61.38 | — | Imported | 2026-05-06 |
| 82 | rising - start-22-rising-start-22 | 61.38 | — | Imported | 2026-05-06 |
| 83 | MagicAgent | user Chelsea007 - Magic_Agent_0910 / deepseek | 61.11 | — | Imported | 2026-05-06 |
| 84 | MagicAgent | user Chelsea007 - Magic_Agent_0918 / deepseek | 61.11 | — | Imported | 2026-05-06 |
| 85 | personal - Sup_rev_8 | 61.11 | — | Imported | 2026-05-06 |
| 86 | gg-org | user geo11 - gg-agent-gpt5-1104 / gg-family | 60.32 | — | Imported | 2026-05-06 |
| 87 | null - test-AgenticData-0918 | 60.05 | — | Imported | 2026-05-06 |
| 88 | personal - 0421_iter2_stage3_qwen3.5_9b | 58.99 | — | Imported | 2026-05-06 |
| 89 | gg-org | user geo11 - gg-agent-cl-1017-rerun1-temp6 / gg-family | 58.73 | — | Imported | 2026-05-06 |
| 90 | rising - start-21-rising-start-21 | 58.73 | — | Imported | 2026-05-06 |
| 91 | agent_for_real | user IncredibleMe - agent_for_real / agent_for_real | 58.47 | — | Imported | 2026-05-06 |
| 92 | CambioML - CambioML energent.ai DS Agent | 57.67 | — | Imported | 2026-05-06 |
| 93 | gg-org | user geo11 - gg-agent-cl-1017-rerun1 / gg-family | 57.41 | — | Imported | 2026-05-06 |
| 94 | gg-org | user geo11 - gg-agent-cl-1017-rerun / gg-family | 57.41 | — | Imported | 2026-05-06 |
| 95 | rising - start-25-rising-start-25 | 57.41 | — | Imported | 2026-05-06 |
| 96 | lol - lol | 57.14 | — | Imported | 2026-05-06 |
| 97 | lol2 - lol2 | 57.14 | — | Imported | 2026-05-06 |
| 98 | burak | user karakanb - codex / gpt-5.4 | 56.88 | — | Imported | 2026-05-06 |
| 99 | Test Agent - Test Agent | 56.61 | — | Imported | 2026-05-06 |
| 100 | baptiste - baptiste | 56.61 | — | Imported | 2026-05-06 |
No matching rows.