DAXBench
Benchmark for evaluating how well language models write production-grade DAX for Power BI and Analysis Services business-intelligence scenarios.
50rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Accuracy, Syntax, Tasks Solved
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Flash Lite Preview HIGH Google | 97.4% | — | Imported | 2026-05-28 |
| 2 | GPT-5.3 Chat OpenAI | 96.2% | GPT-5.3 Chat openai-gpt-5.3-chat | Imported | 2026-05-28 |
| 3 | GLM 5 Z.AI | 96.2% | — | Imported | 2026-05-28 |
| 4 | GPT-5.4 Mini OpenAI | 96.2% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 5 | Gemma 4 31B Google | 94.5% | — | Imported | 2026-05-28 |
| 6 | Gemini 3.1 Pro Preview HIGH Google | 93.8% | — | Imported | 2026-05-28 |
| 7 | Qwen3.6 Plus Preview (free) Qwen | 93.3% | — | Imported | 2026-05-28 |
| 8 | Qwen3.5-Flash MED Qwen | 93.2% | — | Imported | 2026-05-28 |
| 9 | GLM 5V Turbo Z.AI | 91.5% | — | Imported | 2026-05-28 |
| 10 | Qwen3.6 Max Preview Qwen | 90.9% | — | Imported | 2026-05-28 |
| 11 | GLM 5.1 Z.AI | 90.3% | — | Imported | 2026-05-28 |
| 12 | Qwen3.5 397B A17B Qwen | 90.3% | — | Imported | 2026-05-28 |
| 13 | Qwen3.5 Plus 2026-02-15 MED Qwen | 89.7% | — | Imported | 2026-05-28 |
| 14 | Qwen3.6 Plus (free) Qwen | 89% | — | Imported | 2026-05-28 |
| 15 | GPT-5.3-Codex HIGH OpenAI | 88.6% | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-28 |
| 16 | GPT-5.5 OpenAI | 86.7% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 17 | gpt-oss-120b OpenAI | 85.6% | — | Imported | 2026-05-28 |
| 18 | GPT-5.1-Codex-Max OpenAI | 85% | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-28 |
| 19 | KAT-Coder-Pro V2 Kwaipilot | 84.8% | — | Imported | 2026-05-28 |
| 20 | Claude Sonnet 4.6 MED Anthropic | 84.5% | — | Imported | 2026-05-28 |
| 21 | GLM 5 Turbo Z.AI | 84.3% | — | Imported | 2026-05-28 |
| 22 | Gemini 3 Flash Preview Google | 83.8% | — | Imported | 2026-05-28 |
| 23 | Gemini 2.5 Flash Preview 09-2025 Google | 83.8% | — | Imported | 2026-05-28 |
| 24 | DeepSeek V4 Pro DeepSeek | 83.2% | — | Imported | 2026-05-28 |
| 25 | GPT-5.4 HIGH OpenAI | 83.2% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 26 | Qwen3.6 Flash Qwen | 83.1% | — | Imported | 2026-05-28 |
| 27 | Claude Opus 4.5 Anthropic | 82.7% | — | Imported | 2026-05-28 |
| 28 | Claude Opus 4.6 Anthropic | 82% | — | Imported | 2026-05-28 |
| 29 | Gemini 3.1 Flash Lite Google | 82% | — | Imported | 2026-05-28 |
| 30 | R1 DeepSeek | 81.3% | — | Imported | 2026-05-28 |
| 31 | Grok 4.3 xAI | 81.3% | — | Imported | 2026-05-28 |
| 32 | Claude Sonnet 4 Anthropic | 81.3% | — | Imported | 2026-05-28 |
| 33 | Gemini 3 Pro Preview Google | 81.3% | — | Imported | 2026-05-28 |
| 34 | o3 OpenAI | 80.2% | — | Imported | 2026-05-28 |
| 35 | Grok 4.20 Beta HIGH xAI | 80.1% | — | Imported | 2026-05-28 |
| 36 | GPT-5.2 Chat OpenAI | 80.1% | GPT-5.2 Chat openai-gpt-5.2-chat | Imported | 2026-05-28 |
| 37 | DeepSeek V3.2 DeepSeek | 79.9% | — | Imported | 2026-05-28 |
| 38 | Qwen3.6 35B A3B Qwen | 79.2% | — | Imported | 2026-05-28 |
| 39 | GPT-5.2 OpenAI | 78.4% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 40 | Kimi K2 Thinking Moonshot AI | 78.4% | — | Imported | 2026-05-28 |
| 41 | Aurora Alpha Openrouter | 78.2% | — | Imported | 2026-05-28 |
| 42 | Claude Sonnet 4.5 Anthropic | 77.9% | — | Imported | 2026-05-28 |
| 43 | Grok 4.20 Multi-Agent Beta HIGH xAI | 77.9% | — | Imported | 2026-05-28 |
| 44 | Hunter Alpha Openrouter | 77.8% | — | Imported | 2026-05-28 |
| 45 | Grok 4 xAI | 76.7% | — | Imported | 2026-05-28 |
| 46 | Gemini 2.0 Flash Google | 76.6% | — | Imported | 2026-05-28 |
| 47 | Gemini 2.0 Flash Experimental (free) Google | 76.6% | — | Imported | 2026-05-28 |
| 48 | o4 Mini OpenAI | 76.5% | — | Imported | 2026-05-28 |
| 49 | Gemini 2.5 Flash Google | 76% | — | Imported | 2026-05-28 |
| 50 | DeepSeek V3.1 DeepSeek | 75.9% | — | Imported | 2026-05-28 |
No matching rows.