MMTU

Massive Multi-Task Table Understanding and Reasoning benchmark with around 28K questions across real-world table tasks including data cleaning, table QA, joins, transformations, and NL-to-code.

26rows
overallprimary metric
2026-05-06sampled

Metadata

Metrics

Overall, Column Relationship, Column Transform, Data Cleaning, KB mapping, NL-2-code, Table Join, Table Matching, Table QA, Table Transform, Table Understanding

Latest Results

Rows are parsed from the public MMTU leaderboard results.json file. Source model display names, model type, model size, data source, and date are preserved.

Rank Subject Overall Model Match Provenance Sampled
1 gpt-5 0.70 GPT-5
openai-gpt-5
Imported 2026-05-06
2 o3 0.69 o3
openai-o3
Imported 2026-05-06
3 gpt-5-mini 0.67 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
4 Gemini-2.5-pro 0.66 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
5 o4-mini (2024-11-20) 0.66 o4 Mini
openai-o4-mini
Imported 2026-05-06
6 Grok-3-mini 0.65 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-06
7 Gemini-2.5-flash 0.63 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
8 Deepseek-R1-0528 0.58 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-06
9 gpt-5-chat 0.58 GPT-5 Chat
openai-gpt-5-chat
Imported 2026-05-06
10 gpt-oss-120b 0.54 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
11 Qwen3-235B-A22B-Thinking-2507 0.53 Qwen3 235B A22B Thinking 2507
qwen-qwen3-235b-a22b-thinking-2507
Imported 2026-05-06
12 Qwen3-235B-A22B-Instruct-2507 0.52 Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-06
13 GPT-4o (2024-11-20) 0.51 GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-06
14 Qwen3-32B 0.51 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-06
15 Llama-4-Maverick-17B-128E-Instruct-FP8 0.49 Imported 2026-05-06
16 gpt-oss-20b 0.48 gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-06
17 Qwen3-8B 0.48 Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-06
18 Llama3.3-70B 0.45 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
19 Mistral-Large-2411 0.45 Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-05-06
20 Mistral-Small-2503 0.42 Imported 2026-05-06
21 GPT-4o-mini 0.40 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
22 Llama-4-Scout-17B-16E-Instruct 0.39 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
23 Qwen3-32B (no_thinking) 0.38 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-06
24 Qwen3-8B (no_thinking) 0.35 Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-06
25 Qwen2.5-7B-Instruct 0.31 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Imported 2026-05-06
26 Llama-3.1-8B 0.27 Imported 2026-05-06