ADBench
Real-world advertising analytics benchmark for LLM agents, using 100 tasks, 11 domain tools, three difficulty levels, and trajectory-aware evaluation.
10rows
pass_at_3primary metric
2026-05-06sampled
Metadata
Metrics
Pass@3, Pass@1, L1 Pass@3, L2 Pass@3, L3 Pass@3
| Rank | Subject | Pass@3 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini-3-Pro | 83 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 2 | o3 | 82 | o3 openai-o3 | Imported | 2026-05-06 |
| 3 | HY-2.0 | 82 | — | Imported | 2026-05-06 |
| 4 | GPT-5.1 | 82 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 5 | GLM-4.7 | 81 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 6 | DeepSeek-V3 | 80 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 7 | Kimi-K2 | 79 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-06 |
| 8 | Qwen3-235B | 68 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 9 | Qwen3-32B | 59 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-06 |
| 10 | Qwen3-8B | 58 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-06 |
No matching rows.