MMMU Pro
Multimodal Multi-task Benchmark
73rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.5 Flash | 88.266% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-28 |
| 2 | GPT 5.5 | 88.266% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 3 | Gemini 3.1 Pro Preview | 88.208% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 4 | Gemini 3 Flash Preview | 87.63% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 5 | Gemini 3 Pro Preview | 87.514% | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 6 | GPT 5.4 2026-03-05 | 87.514% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 7 | Muse Spark | 87.399% | — | Imported | 2026-05-28 |
| 8 | GPT 5.2 2025-12-11 | 86.667% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 9 | Claude Opus 4.8 | 86.59% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Imported | 2026-05-28 |
| 10 | Kimi K2.6 Thinking | 86.301% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 11 | Claude Opus 4.7 | 85.549% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 12 | Kimi K2.5 Thinking | 84.335% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 13 | Qwen 3.6 Plus | 84.162% | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 14 | Claude Opus 4.6 Thinking | 83.873% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 15 | Claude Sonnet 4.6 | 83.584% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 16 | Grok 4.20 0309 Reasoning | 83.468% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-28 |
| 17 | GPT 5.1 2025-11-13 | 83.179% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-28 |
| 18 | Grok 4.3 | 83.064% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-28 |
| 19 | Claude Opus 4.5 20251101 Thinking | 82.948% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 20 | Gemini 3.1 Flash Lite Preview | 82.486% | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-28 |
| 21 | Qwen 3.5 Flash | 81.908% | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-28 |
| 22 | GPT 5.2025-08-07 | 81.503% | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 23 | Gemini 2.5 Pro Exp 03 25 | 81.34% | — | Imported | 2026-05-28 |
| 24 | Claude Opus 4.5 20251101 | 81.098% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 25 | Gemini 2.5 Flash Preview 09 2025 Thinking | 80.751% | — | Imported | 2026-05-28 |
| 26 | O3 2025-04-16 | 80.416% | o3 openai-o3 | Imported | 2026-05-28 |
| 27 | O4 Mini 2025-04-16 | 79.665% | o4 Mini openai-o4-mini | Imported | 2026-05-28 |
| 28 | Gemini 2.5 Flash Preview 09 2025 | 79.48% | — | Imported | 2026-05-28 |
| 29 | Claude Sonnet 4.5 20250929 Thinking | 79.306% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 30 | GPT 5.4 Mini 2026-03-17 | 79.249% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 31 | GPT 5 Mini 2025-08-07 | 78.914% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 32 | Claude Opus 4.1 20250805 Thinking | 77.514% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 33 | O1 2024-12-17 | 77.412% | o1 openai-o1 | Imported | 2026-05-28 |
| 34 | Grok 4.0709 | 76.27% | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 35 | Gemini 2.5 Flash Lite Preview 09 2025 Thinking | 75.434% | — | Imported | 2026-05-28 |
| 36 | Claude 3 7 Sonnet 20250219 Thinking | 75.101% | — | Imported | 2026-05-28 |
| 37 | Claude Sonnet 4.20250514 Thinking | 74.928% | — | Imported | 2026-05-28 |
| 38 | Claude Opus 4.1 20250805 | 73.715% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 39 | GPT 5.4 Nano 2026-03-17 | 73.584% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
| 40 | Claude Opus 4.20250514 | 73.31% | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-28 |
| 41 | Grok 4 Fast Reasoning | 72.775% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 42 | Grok 4.1 Fast Reasoning | 72.659% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 43 | Gemini 2.5 Flash Lite Preview 09 2025 | 72.543% | Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025 | Imported | 2026-05-28 |
| 44 | Claude Sonnet 4.20250514 | 72.386% | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-28 |
| 45 | GPT 4.1 2025-04-14 | 72.386% | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-28 |
| 46 | Gemini 2.5 Flash Preview 04 17 Thinking | 71.924% | — | Imported | 2026-05-28 |
| 47 | Llama4 Maverick Instruct Basic | 71.693% | — | Imported | 2026-05-28 |
| 48 | Claude 3 7 Sonnet 20250219 | 71.519% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-28 |
| 49 | GPT 5 Nano 2025-08-07 | 70.942% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
| 50 | Command A Plus 05 2026 | 70.636% | — | Imported | 2026-05-28 |
| 51 | GPT 4.1 Mini 2025-04-14 | 70.537% | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-28 |
| 52 | Gemini 2.0 Flash 001 | 69.786% | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-28 |
| 53 | Claude 3 5 Sonnet 20241022 | 68.804% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-28 |
| 54 | Gemini 2.5 Flash Preview 04 17 | 67.995% | — | Imported | 2026-05-28 |
| 55 | Mistral Large 2512 | 66.185% | Mistral: Mistral Large 3 2512 mistralai-mistral-large-2512 | Imported | 2026-05-28 |
| 56 | Gemini 1.5 Pro 002 | 65.511% | — | Imported | 2026-05-28 |
| 57 | Magistral Small 2509 | 65.202% | — | Imported | 2026-05-28 |
| 58 | Magistral Medium 2509 | 64.566% | — | Imported | 2026-05-28 |
| 59 | GPT 4O 2024-08-06 | 64.009% | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-28 |
| 60 | Grok 4.1 Fast Non Reasoning | 63.699% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 61 | Grok 4 Fast Non Reasoning | 63.41% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 62 | Mistral Medium 2505 | 62.969% | — | Imported | 2026-05-28 |
| 63 | GPT 4O 2024-11-20 | 62.161% | GPT-4o (2024-11-20) openai-gpt-4o-2024-11-20 | Imported | 2026-05-28 |
| 64 | Mistral Small 2503 | 60.081% | — | Imported | 2026-05-28 |
| 65 | Llama 4 Scout 17B 16E Instruct | 58.752% | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-28 |
| 66 | Grok 2 Vision 1212 | 57.25% | — | Imported | 2026-05-28 |
| 67 | Gemini 1.5 Flash 002 | 57.192% | — | Imported | 2026-05-28 |
| 68 | GPT 4O Mini 2024-07-18 | 56.557% | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-28 |
| 69 | GPT 4.1 Nano 2025-04-14 | 55.055% | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-28 |
| 70 | Llama 3.2 90B Vision Instruct Turbo | 48.065% | — | Imported | 2026-05-28 |
| 71 | Claude Haiku 4.5 20251001 Thinking | 46.069% | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 72 | Llama 3.2 11B Vision Instruct Turbo | 38.821% | — | Imported | 2026-05-28 |
| 73 | Qwen 3.5 Plus Thinking | 22.775% | — | Imported | 2026-05-28 |
No matching rows.