MMMU-Pro | BenchmarkList

Metadata

ID: mmmu_pro
Category: Multimodal
Release: 2024-09-04
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

MMMU-Pro Overall, MMMU-Pro Vision, MMMU-Pro Standard, MMMU Val Overall, MMMU Val Art & Design, MMMU Val Business, MMMU Val Science, MMMU Val Health & Medicine, MMMU Val Human. & Social Sci., MMMU Val Tech & Eng., MMMU Test Overall, MMMU Test Art & Design, MMMU Test Business, MMMU Test Science, MMMU Test Health & Medicine, MMMU Test Human. & Social Sci., MMMU Test Tech & Eng.

Showing 2 latest source slices.

Rank	Subject	MMMU-Pro Overall	Model Match	Provenance	Sampled
1	Human Expert (High)	85.40	—	Imported	2026-05-06
2	GPT-5.4 Thinking w/ tools	82.10	GPT-5.4 openai-gpt-5.4	Imported	2026-05-06
3	GPT-5.4 Thinking w/o tools	81.20	GPT-5.4 openai-gpt-5.4	Imported	2026-05-06
4	Gemini 3.0 Pro	81	—	Imported	2026-05-06
5	Human Expert (Medium)	80.80	—	Imported	2026-05-06
6	Gemini 3.1 Pro Thinking (High)	80.50	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-06
7	GPT-5.2 Thinking w/o Python	80.40	GPT-5.2 openai-gpt-5.2	Imported	2026-05-06
8	Muse Spark Thinking	80.40	—	Imported	2026-05-06
9	GPT-5.2 Thinking w/o tools	79.50	GPT-5.2 openai-gpt-5.2	Imported	2026-05-06
10	GPT-5.1 Thinking	79	GPT-5.1 openai-gpt-5.1	Imported	2026-05-06
11	GPT-5 w/ thinking	78.40	GPT-5 openai-gpt-5	Imported	2026-05-06
12	Claude Opus 4.6 w/ tools	77.30	—	Imported	2026-05-06
13	Gemma 4 31B	76.90	Gemma 4 31B google-gemma-4-31b-it	Imported	2026-05-06
14	o3	76.40	o3 openai-o3	Imported	2026-05-06
15	GPT-5.1	76	GPT-5.1 openai-gpt-5.1	Imported	2026-05-06
16	Claude Sonnet 4.6 w/ tools	75.60	—	Imported	2026-05-06
17	Claude Sonnet 4.6 w/o tools	74.50	—	Imported	2026-05-06
18	Claude Opus 4.5	73.90	Claude Opus 4.5 anthropic-claude-opus-4.5	Imported	2026-05-06
19	Claude Opus 4.6 w/o tools	73.90	—	Imported	2026-05-06
20	Gemma 4 26B A4B	73.80	Gemma 4 26B A4B google-gemma-4-26b-a4b-it	Imported	2026-05-06
21	Human Expert (Low)	73	—	Imported	2026-05-06
22	dots.vlm1	70.10	—	Imported	2026-05-06
23	Claude Sonnet 4.5	68.90	Claude Sonnet 4.5 anthropic-claude-sonnet-4.5	Imported	2026-05-06
24	Qwen3-VL 235B-A22B	68.10	—	Imported	2026-05-06
25	Gemini 2.5 Pro 05-06	68	—	Imported	2026-05-06
26	Seed 1.5-VL Thinking	67.60	—	Imported	2026-05-06
27	Seed 1.6-Thinking	66.40	—	Imported	2026-05-06
28	GLM-4.5V w/ Thinking	65.20	—	Imported	2026-05-06
29	Claude Sonnet 4.5 w/o tools	63.40	—	Imported	2026-05-06
30	GPT-5 w/o thinking	62.70	GPT-5 openai-gpt-5	Imported	2026-05-06
31	Seed 1.5-VL	59.90	—	Imported	2026-05-06
32	GLM-4.1V w/ Thinking	57.10	—	Imported	2026-05-06
33	Skywork-R1V3-38B	55.40	—	Imported	2026-05-06
34	Gemma 4 E4B	52.60	—	Imported	2026-05-06
35	GPT-4o (0513)	51.90	GPT-4o openai-gpt-4o	Imported	2026-05-06
36	Claude 3.5 Sonnet	51.50	Claude 3.5 Sonnet anthropic-claude-3.5-sonnet	Imported	2026-05-06
37	InternVL2.5-78B	48.60	—	Imported	2026-05-06
38	Gemini 1.5 Pro (0801)	46.90	—	Imported	2026-05-06
39	Qwen2-VL-72B	46.20	—	Imported	2026-05-06
40	Qwen2.5-VL 72B	46.20	Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct	Imported	2026-05-06
41	InternVL2.5-38B	46	—	Imported	2026-05-06
42	Gemma 4 E2B	44.20	—	Imported	2026-05-06
43	EVLM-KTO	43.80	—	Imported	2026-05-06
44	Gemini 1.5 Pro (0523)	43.50	—	Imported	2026-05-06
45	MiMo-VL 7B-RL	43.30	—	Imported	2026-05-06
46	MiMo-VL 7B-SFT	42.30	—	Imported	2026-05-06
47	InternVL2-Llama3-76B	40	—	Imported	2026-05-06
48	Llama 3.2 90B	39.50	—	Imported	2026-05-06
49	Qwen2.5-VL 7B	38.30	—	Imported	2026-05-06
50	GPT-4o mini	37.60	GPT-4o-mini openai-gpt-4o-mini	Imported	2026-05-06
51	InternVL2.5-26B	37.10	—	Imported	2026-05-06
52	InternVL2.5-8B	34.30	—	Imported	2026-05-06
53	InternVL2-40B	34.20	—	Imported	2026-05-06
54	NVILA	33.70	—	Imported	2026-05-06
55	Qwen2.5-VL 3B	31.60	—	Imported	2026-05-06
56	LLaVA-OneVision-72B	31	—	Imported	2026-05-06
57	InternVL2-8B	29	—	Imported	2026-05-06
58	Llama 3.2 11B	28.40	—	Imported	2026-05-06
59	MiniCPM-V 2.6	27.20	—	Imported	2026-05-06
60	MAmmoTH-VL-8B	25.30	—	Imported	2026-05-06
61	LlaVA-NEXT-72B	25.10	—	Imported	2026-05-06
62	LLaVA-OneVision-7B	24.10	—	Imported	2026-05-06
63	LLaVA-NEXT-34B	23.80	—	Imported	2026-05-06
64	InternVL2.5-2B	23.70	—	Imported	2026-05-06
65	Idefics3-8B-Llama3	22.90	—	Imported	2026-05-06
66	Phi-3.5-Vision	19.70	—	Imported	2026-05-06
67	MiniCPM-Llama3-V 2.5	19.60	—	Imported	2026-05-06
68	InternVL2.5-1B	19.40	—	Imported	2026-05-06
69	LLaVA-NeXT-13B	17.20	—	Imported	2026-05-06
70	LLaVA-NeXT-mistral-7B	17	—	Imported	2026-05-06
71	LLaVA-NeXT-Vicuna-7B	16.10	—	Imported	2026-05-06
72	Random Choice	12.60	—	Imported	2026-05-06
73	Frequent Choice	12.10	—	Imported	2026-05-06
1	GPT-5.5	83.2%	GPT-5.5 openai-gpt-5.5	Launch post	2026-04-23
2	GPT-5.4	82.1%	GPT-5.4 openai-gpt-5.4	Launch post	2026-04-23
3	Gemini 3.1 Pro Preview	80.5%	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Launch post	2026-04-23

Metadata

Metrics

Latest Results