MMMU-Pro
MMMU-Pro evaluates expert-level multimodal understanding with vision and standard variants derived from the MMMU benchmark.
76rows
pro_overallprimary metric
2026-05-06sampled
Metadata
Metrics
MMMU-Pro Overall, MMMU-Pro Vision, MMMU-Pro Standard, MMMU Val Overall, MMMU Val Art & Design, MMMU Val Business, MMMU Val Science, MMMU Val Health & Medicine, MMMU Val Human. & Social Sci., MMMU Val Tech & Eng., MMMU Test Overall, MMMU Test Art & Design, MMMU Test Business, MMMU Test Science, MMMU Test Health & Medicine, MMMU Test Human. & Social Sci., MMMU Test Tech & Eng.
Showing 2 latest source slices.
| Rank | Subject | MMMU-Pro Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Expert (High) | 85.40 | — | Imported | 2026-05-06 |
| 2 | GPT-5.4 Thinking w/ tools | 82.10 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 3 | GPT-5.4 Thinking w/o tools | 81.20 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 4 | Gemini 3.0 Pro | 81 | — | Imported | 2026-05-06 |
| 5 | Human Expert (Medium) | 80.80 | — | Imported | 2026-05-06 |
| 6 | Gemini 3.1 Pro Thinking (High) | 80.50 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 7 | GPT-5.2 Thinking w/o Python | 80.40 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 8 | Muse Spark Thinking | 80.40 | — | Imported | 2026-05-06 |
| 9 | GPT-5.2 Thinking w/o tools | 79.50 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 10 | GPT-5.1 Thinking | 79 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 11 | GPT-5 w/ thinking | 78.40 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 12 | Claude Opus 4.6 w/ tools | 77.30 | — | Imported | 2026-05-06 |
| 13 | Gemma 4 31B | 76.90 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-06 |
| 14 | o3 | 76.40 | o3 openai-o3 | Imported | 2026-05-06 |
| 15 | GPT-5.1 | 76 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 16 | Claude Sonnet 4.6 w/ tools | 75.60 | — | Imported | 2026-05-06 |
| 17 | Claude Sonnet 4.6 w/o tools | 74.50 | — | Imported | 2026-05-06 |
| 18 | Claude Opus 4.5 | 73.90 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 19 | Claude Opus 4.6 w/o tools | 73.90 | — | Imported | 2026-05-06 |
| 20 | Gemma 4 26B A4B | 73.80 | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-06 |
| 21 | Human Expert (Low) | 73 | — | Imported | 2026-05-06 |
| 22 | dots.vlm1 | 70.10 | — | Imported | 2026-05-06 |
| 23 | Claude Sonnet 4.5 | 68.90 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 24 | Qwen3-VL 235B-A22B | 68.10 | — | Imported | 2026-05-06 |
| 25 | Gemini 2.5 Pro 05-06 | 68 | — | Imported | 2026-05-06 |
| 26 | Seed 1.5-VL Thinking | 67.60 | — | Imported | 2026-05-06 |
| 27 | Seed 1.6-Thinking | 66.40 | — | Imported | 2026-05-06 |
| 28 | GLM-4.5V w/ Thinking | 65.20 | — | Imported | 2026-05-06 |
| 29 | Claude Sonnet 4.5 w/o tools | 63.40 | — | Imported | 2026-05-06 |
| 30 | GPT-5 w/o thinking | 62.70 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 31 | Seed 1.5-VL | 59.90 | — | Imported | 2026-05-06 |
| 32 | GLM-4.1V w/ Thinking | 57.10 | — | Imported | 2026-05-06 |
| 33 | Skywork-R1V3-38B | 55.40 | — | Imported | 2026-05-06 |
| 34 | Gemma 4 E4B | 52.60 | — | Imported | 2026-05-06 |
| 35 | GPT-4o (0513) | 51.90 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 36 | Claude 3.5 Sonnet | 51.50 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 37 | InternVL2.5-78B | 48.60 | — | Imported | 2026-05-06 |
| 38 | Gemini 1.5 Pro (0801) | 46.90 | — | Imported | 2026-05-06 |
| 39 | Qwen2-VL-72B | 46.20 | — | Imported | 2026-05-06 |
| 40 | Qwen2.5-VL 72B | 46.20 | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-06 |
| 41 | InternVL2.5-38B | 46 | — | Imported | 2026-05-06 |
| 42 | Gemma 4 E2B | 44.20 | — | Imported | 2026-05-06 |
| 43 | EVLM-KTO | 43.80 | — | Imported | 2026-05-06 |
| 44 | Gemini 1.5 Pro (0523) | 43.50 | — | Imported | 2026-05-06 |
| 45 | MiMo-VL 7B-RL | 43.30 | — | Imported | 2026-05-06 |
| 46 | MiMo-VL 7B-SFT | 42.30 | — | Imported | 2026-05-06 |
| 47 | InternVL2-Llama3-76B | 40 | — | Imported | 2026-05-06 |
| 48 | Llama 3.2 90B | 39.50 | — | Imported | 2026-05-06 |
| 49 | Qwen2.5-VL 7B | 38.30 | — | Imported | 2026-05-06 |
| 50 | GPT-4o mini | 37.60 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 51 | InternVL2.5-26B | 37.10 | — | Imported | 2026-05-06 |
| 52 | InternVL2.5-8B | 34.30 | — | Imported | 2026-05-06 |
| 53 | InternVL2-40B | 34.20 | — | Imported | 2026-05-06 |
| 54 | NVILA | 33.70 | — | Imported | 2026-05-06 |
| 55 | Qwen2.5-VL 3B | 31.60 | — | Imported | 2026-05-06 |
| 56 | LLaVA-OneVision-72B | 31 | — | Imported | 2026-05-06 |
| 57 | InternVL2-8B | 29 | — | Imported | 2026-05-06 |
| 58 | Llama 3.2 11B | 28.40 | — | Imported | 2026-05-06 |
| 59 | MiniCPM-V 2.6 | 27.20 | — | Imported | 2026-05-06 |
| 60 | MAmmoTH-VL-8B | 25.30 | — | Imported | 2026-05-06 |
| 61 | LlaVA-NEXT-72B | 25.10 | — | Imported | 2026-05-06 |
| 62 | LLaVA-OneVision-7B | 24.10 | — | Imported | 2026-05-06 |
| 63 | LLaVA-NEXT-34B | 23.80 | — | Imported | 2026-05-06 |
| 64 | InternVL2.5-2B | 23.70 | — | Imported | 2026-05-06 |
| 65 | Idefics3-8B-Llama3 | 22.90 | — | Imported | 2026-05-06 |
| 66 | Phi-3.5-Vision | 19.70 | — | Imported | 2026-05-06 |
| 67 | MiniCPM-Llama3-V 2.5 | 19.60 | — | Imported | 2026-05-06 |
| 68 | InternVL2.5-1B | 19.40 | — | Imported | 2026-05-06 |
| 69 | LLaVA-NeXT-13B | 17.20 | — | Imported | 2026-05-06 |
| 70 | LLaVA-NeXT-mistral-7B | 17 | — | Imported | 2026-05-06 |
| 71 | LLaVA-NeXT-Vicuna-7B | 16.10 | — | Imported | 2026-05-06 |
| 72 | Random Choice | 12.60 | — | Imported | 2026-05-06 |
| 73 | Frequent Choice | 12.10 | — | Imported | 2026-05-06 |
| 1 | GPT-5.5 | 83.2% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 2 | GPT-5.4 | 82.1% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
| 3 | Gemini 3.1 Pro Preview | 80.5% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
No matching rows.