LiveSQLBench

Dynamic contamination-free text-to-SQL benchmark for real-world database tasks, including business-intelligence queries, CRUD/management SQL, hierarchical knowledge bases, and large industrial-scale database variants.

34rows
success_rateprimary metric
2026-05-06sampled

Metadata

Metrics

Success Rate, Average Cost per Task (lower is better)

Latest Results

Rows are parsed from the public LiveSQLBench homepage leaderboard table for the active Base benchmark window. Scores are success rates as displayed by the source.

Rank Subject Success Rate Model Match Provenance Sampled
1 Gemini 3.1 Pro 43.10 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
2 Claude Opus 4.6 39.43 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
3 GPT-5.5 (xhigh) 37.36 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
4 GPT-5.5 (low) 37.24 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
5 Kimi K2.6 36.43 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-06
6 GLM-5.1 35.29 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-06
7 Qwen3.6 Max 33.79 Imported 2026-05-06
8 GPT-5.4 33.56 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
9 GPT-5.3 Codex 33.33 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-06
10 o3-mini 31.15 o3-mini
openai-o3-mini
Imported 2026-05-06
11 GPT-5 31.15 GPT-5
openai-gpt-5
Imported 2026-05-06
12 Claude Sonnet 4.5 30.46 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
13 Kimi 2.5 29.89 Imported 2026-05-06
14 o4-mini 29.54 o4 Mini
openai-o4-mini
Imported 2026-05-06
15 o3 29.54 o3
openai-o3
Imported 2026-05-06
16 Claude Sonnet 4 27.01 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-06
17 Qwen3-235B-A22B 26.90 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
18 DeepSeek R1 26.90 R1
deepseek-r1
Imported 2026-05-06
19 Claude 3.7 Sonnet (Thinking) 26.55 Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-06
20 Qwen3 Coder 480B 26.21 Imported 2026-05-06
21 Claude 3.7 Sonnet 25.75 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
22 GLM 4.7 25.52 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
23 MiniMax M2.1 24.37 MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-06
24 DeepSeek V3 23.68 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
25 QwQ-32B 22.30 Imported 2026-05-06
26 GPT-4o 21.38 GPT-4o
openai-gpt-4o
Imported 2026-05-06
27 Llama 4 Scout 18.55 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
28 Llama 4 Maverick 18.05 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
29 Llama 3.3 70B Instruct 15.86 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
30 Qwen2.5 Coder 32B 15.75 Imported 2026-05-06
31 Codestral 22B 12.53 Imported 2026-05-06
32 Qwen2.5 Coder 7B 8.16 Imported 2026-05-06
33 Mistral 7B Instruct 3.10 Imported 2026-05-06
34 Mixtral 8x7B Instruct 2.41 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06