MMMU-Pro

MMMU-Pro evaluates expert-level multimodal understanding with vision and standard variants derived from the MMMU benchmark.

76rows
pro_overallprimary metric
2026-05-06sampled

Metadata

Metrics

MMMU-Pro Overall, MMMU-Pro Vision, MMMU-Pro Standard, MMMU Val Overall, MMMU Val Art & Design, MMMU Val Business, MMMU Val Science, MMMU Val Health & Medicine, MMMU Val Human. & Social Sci., MMMU Val Tech & Eng., MMMU Test Overall, MMMU Test Art & Design, MMMU Test Business, MMMU Test Science, MMMU Test Health & Medicine, MMMU Test Human. & Social Sci., MMMU Test Tech & Eng.

Showing 2 latest source slices.

Latest Results

Rows are scoped to entries with MMMU-Pro Overall and ranked by pro_overall. Source display names are preserved without canonical model mapping.

Rank Subject MMMU-Pro Overall Model Match Provenance Sampled
1 Human Expert (High) 85.40 Imported 2026-05-06
2 GPT-5.4 Thinking w/ tools 82.10 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
3 GPT-5.4 Thinking w/o tools 81.20 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
4 Gemini 3.0 Pro 81 Imported 2026-05-06
5 Human Expert (Medium) 80.80 Imported 2026-05-06
6 Gemini 3.1 Pro Thinking (High) 80.50 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
7 GPT-5.2 Thinking w/o Python 80.40 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
8 Muse Spark Thinking 80.40 Imported 2026-05-06
9 GPT-5.2 Thinking w/o tools 79.50 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
10 GPT-5.1 Thinking 79 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
11 GPT-5 w/ thinking 78.40 GPT-5
openai-gpt-5
Imported 2026-05-06
12 Claude Opus 4.6 w/ tools 77.30 Imported 2026-05-06
13 Gemma 4 31B 76.90 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-06
14 o3 76.40 o3
openai-o3
Imported 2026-05-06
15 GPT-5.1 76 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
16 Claude Sonnet 4.6 w/ tools 75.60 Imported 2026-05-06
17 Claude Sonnet 4.6 w/o tools 74.50 Imported 2026-05-06
18 Claude Opus 4.5 73.90 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
19 Claude Opus 4.6 w/o tools 73.90 Imported 2026-05-06
20 Gemma 4 26B A4B 73.80 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-06
21 Human Expert (Low) 73 Imported 2026-05-06
22 dots.vlm1 70.10 Imported 2026-05-06
23 Claude Sonnet 4.5 68.90 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
24 Qwen3-VL 235B-A22B 68.10 Imported 2026-05-06
25 Gemini 2.5 Pro 05-06 68 Imported 2026-05-06
26 Seed 1.5-VL Thinking 67.60 Imported 2026-05-06
27 Seed 1.6-Thinking 66.40 Imported 2026-05-06
28 GLM-4.5V w/ Thinking 65.20 Imported 2026-05-06
29 Claude Sonnet 4.5 w/o tools 63.40 Imported 2026-05-06
30 GPT-5 w/o thinking 62.70 GPT-5
openai-gpt-5
Imported 2026-05-06
31 Seed 1.5-VL 59.90 Imported 2026-05-06
32 GLM-4.1V w/ Thinking 57.10 Imported 2026-05-06
33 Skywork-R1V3-38B 55.40 Imported 2026-05-06
34 Gemma 4 E4B 52.60 Imported 2026-05-06
35 GPT-4o (0513) 51.90 GPT-4o
openai-gpt-4o
Imported 2026-05-06
36 Claude 3.5 Sonnet 51.50 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
37 InternVL2.5-78B 48.60 Imported 2026-05-06
38 Gemini 1.5 Pro (0801) 46.90 Imported 2026-05-06
39 Qwen2-VL-72B 46.20 Imported 2026-05-06
40 Qwen2.5-VL 72B 46.20 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-06
41 InternVL2.5-38B 46 Imported 2026-05-06
42 Gemma 4 E2B 44.20 Imported 2026-05-06
43 EVLM-KTO 43.80 Imported 2026-05-06
44 Gemini 1.5 Pro (0523) 43.50 Imported 2026-05-06
45 MiMo-VL 7B-RL 43.30 Imported 2026-05-06
46 MiMo-VL 7B-SFT 42.30 Imported 2026-05-06
47 InternVL2-Llama3-76B 40 Imported 2026-05-06
48 Llama 3.2 90B 39.50 Imported 2026-05-06
49 Qwen2.5-VL 7B 38.30 Imported 2026-05-06
50 GPT-4o mini 37.60 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
51 InternVL2.5-26B 37.10 Imported 2026-05-06
52 InternVL2.5-8B 34.30 Imported 2026-05-06
53 InternVL2-40B 34.20 Imported 2026-05-06
54 NVILA 33.70 Imported 2026-05-06
55 Qwen2.5-VL 3B 31.60 Imported 2026-05-06
56 LLaVA-OneVision-72B 31 Imported 2026-05-06
57 InternVL2-8B 29 Imported 2026-05-06
58 Llama 3.2 11B 28.40 Imported 2026-05-06
59 MiniCPM-V 2.6 27.20 Imported 2026-05-06
60 MAmmoTH-VL-8B 25.30 Imported 2026-05-06
61 LlaVA-NEXT-72B 25.10 Imported 2026-05-06
62 LLaVA-OneVision-7B 24.10 Imported 2026-05-06
63 LLaVA-NEXT-34B 23.80 Imported 2026-05-06
64 InternVL2.5-2B 23.70 Imported 2026-05-06
65 Idefics3-8B-Llama3 22.90 Imported 2026-05-06
66 Phi-3.5-Vision 19.70 Imported 2026-05-06
67 MiniCPM-Llama3-V 2.5 19.60 Imported 2026-05-06
68 InternVL2.5-1B 19.40 Imported 2026-05-06
69 LLaVA-NeXT-13B 17.20 Imported 2026-05-06
70 LLaVA-NeXT-mistral-7B 17 Imported 2026-05-06
71 LLaVA-NeXT-Vicuna-7B 16.10 Imported 2026-05-06
72 Random Choice 12.60 Imported 2026-05-06
73 Frequent Choice 12.10 Imported 2026-05-06
1 GPT-5.5 83.2% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
2 GPT-5.4 82.1% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
3 Gemini 3.1 Pro Preview 80.5% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23