MMMU Pro

Multimodal Multi-task Benchmark

73rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Gemini 3.5 Flash 88.266% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
2 GPT 5.5 88.266% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
3 Gemini 3.1 Pro Preview 88.208% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
4 Gemini 3 Flash Preview 87.63% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
5 Gemini 3 Pro Preview 87.514% Gemini 3
google-gemini-3
Imported 2026-05-28
6 GPT 5.4 2026-03-05 87.514% GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
7 Muse Spark 87.399% Imported 2026-05-28
8 GPT 5.2 2025-12-11 86.667% GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
9 Claude Opus 4.8 86.59% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
10 Kimi K2.6 Thinking 86.301% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
11 Claude Opus 4.7 85.549% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
12 Kimi K2.5 Thinking 84.335% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-28
13 Qwen 3.6 Plus 84.162% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
14 Claude Opus 4.6 Thinking 83.873% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
15 Claude Sonnet 4.6 83.584% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28
16 Grok 4.20 0309 Reasoning 83.468% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
17 GPT 5.1 2025-11-13 83.179% GPT-5.1
openai-gpt-5.1
Imported 2026-05-28
18 Grok 4.3 83.064% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
19 Claude Opus 4.5 20251101 Thinking 82.948% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
20 Gemini 3.1 Flash Lite Preview 82.486% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
21 Qwen 3.5 Flash 81.908% Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-28
22 GPT 5.2025-08-07 81.503% GPT-5
openai-gpt-5
Imported 2026-05-28
23 Gemini 2.5 Pro Exp 03 25 81.34% Imported 2026-05-28
24 Claude Opus 4.5 20251101 81.098% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
25 Gemini 2.5 Flash Preview 09 2025 Thinking 80.751% Imported 2026-05-28
26 O3 2025-04-16 80.416% o3
openai-o3
Imported 2026-05-28
27 O4 Mini 2025-04-16 79.665% o4 Mini
openai-o4-mini
Imported 2026-05-28
28 Gemini 2.5 Flash Preview 09 2025 79.48% Imported 2026-05-28
29 Claude Sonnet 4.5 20250929 Thinking 79.306% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
30 GPT 5.4 Mini 2026-03-17 79.249% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-28
31 GPT 5 Mini 2025-08-07 78.914% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
32 Claude Opus 4.1 20250805 Thinking 77.514% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
33 O1 2024-12-17 77.412% o1
openai-o1
Imported 2026-05-28
34 Grok 4.0709 76.27% GROK Grok 4
x-ai-grok-4
Imported 2026-05-28
35 Gemini 2.5 Flash Lite Preview 09 2025 Thinking 75.434% Imported 2026-05-28
36 Claude 3 7 Sonnet 20250219 Thinking 75.101% Imported 2026-05-28
37 Claude Sonnet 4.20250514 Thinking 74.928% Imported 2026-05-28
38 Claude Opus 4.1 20250805 73.715% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
39 GPT 5.4 Nano 2026-03-17 73.584% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
40 Claude Opus 4.20250514 73.31% Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-28
41 Grok 4 Fast Reasoning 72.775% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
42 Grok 4.1 Fast Reasoning 72.659% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
43 Gemini 2.5 Flash Lite Preview 09 2025 72.543% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-28
44 Claude Sonnet 4.20250514 72.386% Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-28
45 GPT 4.1 2025-04-14 72.386% GPT-4.1
openai-gpt-4.1
Imported 2026-05-28
46 Gemini 2.5 Flash Preview 04 17 Thinking 71.924% Imported 2026-05-28
47 Llama4 Maverick Instruct Basic 71.693% Imported 2026-05-28
48 Claude 3 7 Sonnet 20250219 71.519% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-28
49 GPT 5 Nano 2025-08-07 70.942% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-28
50 Command A Plus 05 2026 70.636% Imported 2026-05-28
51 GPT 4.1 Mini 2025-04-14 70.537% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-28
52 Gemini 2.0 Flash 001 69.786% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-28
53 Claude 3 5 Sonnet 20241022 68.804% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-28
54 Gemini 2.5 Flash Preview 04 17 67.995% Imported 2026-05-28
55 Mistral Large 2512 66.185% Mistral: Mistral Large 3 2512
mistralai-mistral-large-2512
Imported 2026-05-28
56 Gemini 1.5 Pro 002 65.511% Imported 2026-05-28
57 Magistral Small 2509 65.202% Imported 2026-05-28
58 Magistral Medium 2509 64.566% Imported 2026-05-28
59 GPT 4O 2024-08-06 64.009% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-28
60 Grok 4.1 Fast Non Reasoning 63.699% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
61 Grok 4 Fast Non Reasoning 63.41% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
62 Mistral Medium 2505 62.969% Imported 2026-05-28
63 GPT 4O 2024-11-20 62.161% GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-28
64 Mistral Small 2503 60.081% Imported 2026-05-28
65 Llama 4 Scout 17B 16E Instruct 58.752% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-28
66 Grok 2 Vision 1212 57.25% Imported 2026-05-28
67 Gemini 1.5 Flash 002 57.192% Imported 2026-05-28
68 GPT 4O Mini 2024-07-18 56.557% GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-28
69 GPT 4.1 Nano 2025-04-14 55.055% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-28
70 Llama 3.2 90B Vision Instruct Turbo 48.065% Imported 2026-05-28
71 Claude Haiku 4.5 20251001 Thinking 46.069% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
72 Llama 3.2 11B Vision Instruct Turbo 38.821% Imported 2026-05-28
73 Qwen 3.5 Plus Thinking 22.775% Imported 2026-05-28