MMStar
MMStar evaluates vision-indispensable multimodal capabilities across perception, reasoning, STEM, and math axes.
19rows
averageprimary metric
2026-05-06sampled
Metadata
Metrics
Coarse perception, Fine-grained perception, Instance reasoning, Logical reasoning, Science and technology, Math, Average, Multi-modal gain, Multi-modal leakage (lower is better)
| Rank | Subject | Average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT4V (high) 🥇 | 57.10 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 2 | InternLM-XComposer2 🥈 | 55.40 | — | Imported | 2026-05-06 |
| 3 | LLaVA-Next 🥉 | 52.10 | — | Imported | 2026-05-06 |
| 4 | GPT4V (low) | 46.10 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 5 | InternVL-Chat-v1.2 | 43.70 | — | Imported | 2026-05-06 |
| 6 | GeminiPro-Vision | 42.60 | — | Imported | 2026-05-06 |
| 7 | MiniCPM-V-2 | 40.70 | — | Imported | 2026-05-06 |
| 8 | Sphinx-X-MoE | 38.90 | — | Imported | 2026-05-06 |
| 9 | Monkey-Chat | 38.30 | — | Imported | 2026-05-06 |
| 10 | Yi-VL | 37.90 | — | Imported | 2026-05-06 |
| 11 | Qwen-VL-Chat | 37.50 | — | Imported | 2026-05-06 |
| 12 | Deepseek-VL | 37.10 | — | Imported | 2026-05-06 |
| 13 | CogVLM-Chat | 36.50 | — | Imported | 2026-05-06 |
| 14 | Yi-VL | 36.10 | — | Imported | 2026-05-06 |
| 15 | TinyLLaVA | 36 | — | Imported | 2026-05-06 |
| 16 | ShareGPT4V | 33 | — | Imported | 2026-05-06 |
| 17 | LLaVA-1.5 | 32.80 | — | Imported | 2026-05-06 |
| 18 | LLaVA-1.5 | 30.30 | — | Imported | 2026-05-06 |
| 19 | Random Choice | 24.60 | — | Imported | 2026-05-06 |
No matching rows.