MMTU
Massive Multi-Task Table Understanding and Reasoning benchmark with around 28K questions across real-world table tasks including data cleaning, table QA, joins, transformations, and NL-to-code.
26rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, Column Relationship, Column Transform, Data Cleaning, KB mapping, NL-2-code, Table Join, Table Matching, Table QA, Table Transform, Table Understanding
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-5 | 0.70 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 2 | o3 | 0.69 | o3 openai-o3 | Imported | 2026-05-06 |
| 3 | gpt-5-mini | 0.67 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 4 | Gemini-2.5-pro | 0.66 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 5 | o4-mini (2024-11-20) | 0.66 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 6 | Grok-3-mini | 0.65 | Grok 3 Mini x-ai-grok-3-mini | Imported | 2026-05-06 |
| 7 | Gemini-2.5-flash | 0.63 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 8 | Deepseek-R1-0528 | 0.58 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-06 |
| 9 | gpt-5-chat | 0.58 | GPT-5 Chat openai-gpt-5-chat | Imported | 2026-05-06 |
| 10 | gpt-oss-120b | 0.54 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 11 | Qwen3-235B-A22B-Thinking-2507 | 0.53 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Imported | 2026-05-06 |
| 12 | Qwen3-235B-A22B-Instruct-2507 | 0.52 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-06 |
| 13 | GPT-4o (2024-11-20) | 0.51 | GPT-4o (2024-11-20) openai-gpt-4o-2024-11-20 | Imported | 2026-05-06 |
| 14 | Qwen3-32B | 0.51 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-06 |
| 15 | Llama-4-Maverick-17B-128E-Instruct-FP8 | 0.49 | — | Imported | 2026-05-06 |
| 16 | gpt-oss-20b | 0.48 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
| 17 | Qwen3-8B | 0.48 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-06 |
| 18 | Llama3.3-70B | 0.45 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 19 | Mistral-Large-2411 | 0.45 | Mistral Large 2411 mistralai-mistral-large-2411 | Imported | 2026-05-06 |
| 20 | Mistral-Small-2503 | 0.42 | — | Imported | 2026-05-06 |
| 21 | GPT-4o-mini | 0.40 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 22 | Llama-4-Scout-17B-16E-Instruct | 0.39 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 23 | Qwen3-32B (no_thinking) | 0.38 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-06 |
| 24 | Qwen3-8B (no_thinking) | 0.35 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-06 |
| 25 | Qwen2.5-7B-Instruct | 0.31 | Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct | Imported | 2026-05-06 |
| 26 | Llama-3.1-8B | 0.27 | — | Imported | 2026-05-06 |
No matching rows.