MMStar

MMStar evaluates vision-indispensable multimodal capabilities across perception, reasoning, STEM, and math axes.

19rows
averageprimary metric
2026-05-06sampled

Metadata

Metrics

Coarse perception, Fine-grained perception, Instance reasoning, Logical reasoning, Science and technology, Math, Average, Multi-modal gain, Multi-modal leakage (lower is better)

Latest Results

Rows ranked by the source table order.

Rank Subject Average Model Match Provenance Sampled
1 GPT4V (high) 🥇 57.10 GPT-4
openai-gpt-4
Imported 2026-05-06
2 InternLM-XComposer2 🥈 55.40 — Imported 2026-05-06
3 LLaVA-Next 🥉 52.10 — Imported 2026-05-06
4 GPT4V (low) 46.10 GPT-4
openai-gpt-4
Imported 2026-05-06
5 InternVL-Chat-v1.2 43.70 — Imported 2026-05-06
6 GeminiPro-Vision 42.60 — Imported 2026-05-06
7 MiniCPM-V-2 40.70 — Imported 2026-05-06
8 Sphinx-X-MoE 38.90 — Imported 2026-05-06
9 Monkey-Chat 38.30 — Imported 2026-05-06
10 Yi-VL 37.90 — Imported 2026-05-06
11 Qwen-VL-Chat 37.50 — Imported 2026-05-06
12 Deepseek-VL 37.10 — Imported 2026-05-06
13 CogVLM-Chat 36.50 — Imported 2026-05-06
14 Yi-VL 36.10 — Imported 2026-05-06
15 TinyLLaVA 36 — Imported 2026-05-06
16 ShareGPT4V 33 — Imported 2026-05-06
17 LLaVA-1.5 32.80 — Imported 2026-05-06
18 LLaVA-1.5 30.30 — Imported 2026-05-06
19 Random Choice 24.60 — Imported 2026-05-06