TableBench

TableBench: Measures structured-data reasoning over tables, spreadsheets, charts, databases, or data analysis tasks.

36rows
overallprimary metric
2026-05-27sampled

Metadata

Metrics

Overall, Fact checking, Numerical reasoning, Data analysis, Visualization

Latest Results

Rows are parsed from the official TableBench GitHub Pages Leaderboard - Methodology table. Score is Overall; category metrics are FC, NR, DA, and VIZ.

Rank Subject Overall Model Match Provenance Sampled
1 Human Performance 85.91% Imported 2026-05-27
2 ButtonAgent 64.14% Imported 2026-05-27
3 RankAgent 62.14% Imported 2026-05-27
4 o4-mini-high + DP 61.69% Imported 2026-05-27
5 o4-mini + DP 60.75% Imported 2026-05-27
6 GPT-5 + DP 59.94% GPT-5
openai-gpt-5
Imported 2026-05-27
7 o3-mini + DP 59.9% Imported 2026-05-27
8 Grok4 + DP 57.8% Imported 2026-05-27
9 Gemini-2.5-Pro + DP 57.18% Imported 2026-05-27
10 Deepseek-R1 + DP 56.31% Imported 2026-05-27
11 Claude4-Sonnet + DP 54.75% Imported 2026-05-27
12 Llama-4-Maverick-17B-128E-Instruct + TCoT 52.73% Imported 2026-05-27
13 Qwen3-32B 52.45% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-27
14 GPT-4o + TCoT 51.96% GPT-4o
openai-gpt-4o
Imported 2026-05-27
15 GPT-4-Turbo + TCoT 51.5% GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-27
16 Deepseek-Chat-V3 + TCoT 50.56% Imported 2026-05-27
17 Llama-3.1-405B-Instruct + TCoT 48.87% Imported 2026-05-27
18 Qwen2.5-72B-Instruct + TCoT 48.79% Imported 2026-05-27
19 Llama-4-Scout-17B-16E-Instruct + TCoT 46.53% Imported 2026-05-27
20 Qwen2.5-Coder-32B-Instruct + TCoT 45.51% Imported 2026-05-27
21 QWQ-32B + DP 43.87% Imported 2026-05-27
22 Llama3.1-70B-Instruct + TCoT 41.05% Imported 2026-05-27
23 TableGPT2-7B + TCoT 41.05% Imported 2026-05-27
24 Llama3-70B-Chat + TCoT 38.68% Imported 2026-05-27
25 GPT-3.5-Turbo + PoT 37.15% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
26 Qwen2.5-Coder-7B-Instruct + TCoT 35.12% Imported 2026-05-27
27 TableLLM-Qwen2-7B + TCoT 31.9% Imported 2026-05-27
28 TableLLM-Llama3.1-8B + TCoT 30.77% Imported 2026-05-27
29 TableLLM-DeepseekCoder-7B + TCoT 30.51% Imported 2026-05-27
30 TableLLM-Llama3-8B + TCoT 29.8% Imported 2026-05-27
31 TableLLM-CodeQwen-7B + TCoT 24.81% Imported 2026-05-27
32 Llama3-8B-Chat + SCoT 22.2% Imported 2026-05-27
33 Qwen2.5-7B-Instruct + TCoT 22.14% Imported 2026-05-27
34 Mixtral-8x7B-Instruct + PoT 21.7% Imported 2026-05-27
35 Llama3.1-8B-Instruct + DP 15.42% Imported 2026-05-27
36 Mistral-7B-Instruct + SCoT 10.97% Imported 2026-05-27