LiveSQLBench
Dynamic contamination-free text-to-SQL benchmark for real-world database tasks, including business-intelligence queries, CRUD/management SQL, hierarchical knowledge bases, and large industrial-scale database variants.
34rows
success_rateprimary metric
2026-05-06sampled
Metadata
Metrics
Success Rate, Average Cost per Task (lower is better)
| Rank | Subject | Success Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro | 43.10 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 2 | Claude Opus 4.6 | 39.43 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 3 | GPT-5.5 (xhigh) | 37.36 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 4 | GPT-5.5 (low) | 37.24 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 5 | Kimi K2.6 | 36.43 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 6 | GLM-5.1 | 35.29 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 7 | Qwen3.6 Max | 33.79 | — | Imported | 2026-05-06 |
| 8 | GPT-5.4 | 33.56 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 9 | GPT-5.3 Codex | 33.33 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 10 | o3-mini | 31.15 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 11 | GPT-5 | 31.15 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 12 | Claude Sonnet 4.5 | 30.46 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 13 | Kimi 2.5 | 29.89 | — | Imported | 2026-05-06 |
| 14 | o4-mini | 29.54 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 15 | o3 | 29.54 | o3 openai-o3 | Imported | 2026-05-06 |
| 16 | Claude Sonnet 4 | 27.01 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-06 |
| 17 | Qwen3-235B-A22B | 26.90 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 18 | DeepSeek R1 | 26.90 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 19 | Claude 3.7 Sonnet (Thinking) | 26.55 | Claude 3.7 Sonnet (thinking) anthropic-claude-3.7-sonnet-thinking | Imported | 2026-05-06 |
| 20 | Qwen3 Coder 480B | 26.21 | — | Imported | 2026-05-06 |
| 21 | Claude 3.7 Sonnet | 25.75 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 22 | GLM 4.7 | 25.52 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 23 | MiniMax M2.1 | 24.37 | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-06 |
| 24 | DeepSeek V3 | 23.68 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 25 | QwQ-32B | 22.30 | — | Imported | 2026-05-06 |
| 26 | GPT-4o | 21.38 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 27 | Llama 4 Scout | 18.55 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 28 | Llama 4 Maverick | 18.05 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 29 | Llama 3.3 70B Instruct | 15.86 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 30 | Qwen2.5 Coder 32B | 15.75 | — | Imported | 2026-05-06 |
| 31 | Codestral 22B | 12.53 | — | Imported | 2026-05-06 |
| 32 | Qwen2.5 Coder 7B | 8.16 | — | Imported | 2026-05-06 |
| 33 | Mistral 7B Instruct | 3.10 | — | Imported | 2026-05-06 |
| 34 | Mixtral 8x7B Instruct | 2.41 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
No matching rows.