ADBench

Real-world advertising analytics benchmark for LLM agents, using 100 tasks, 11 domain tools, three difficulty levels, and trajectory-aware evaluation.

10rows
pass_at_3primary metric
2026-05-06sampled

Metadata

Metrics

Pass@3, Pass@1, L1 Pass@3, L2 Pass@3, L3 Pass@3

Latest Results

Overall rows ranked by Pass@3, then Pass@1. Source display names are preserved from the static leaderboard.

Rank Subject Pass@3 Model Match Provenance Sampled
1 Gemini-3-Pro 83 Gemini 3
google-gemini-3
Imported 2026-05-06
2 o3 82 o3
openai-o3
Imported 2026-05-06
3 HY-2.0 82 Imported 2026-05-06
4 GPT-5.1 82 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
5 GLM-4.7 81 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
6 DeepSeek-V3 80 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
7 Kimi-K2 79 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-06
8 Qwen3-235B 68 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
9 Qwen3-32B 59 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-06
10 Qwen3-8B 58 Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-06