MMBench-CN

MMBench-CN: Evaluates multimodal understanding across image, text, chart, diagram, or cross-modal reasoning tasks.

30rows
overall_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Overall accuracy, Coarse perception, Fine-grained perception (single), Fine-grained perception (cross), Attribute reasoning, Logic reasoning, Relation reasoning

Latest Results

Rows are parsed from the MMBench paper arXiv LaTeX MMBench-CN test-set table.

Rank Subject Overall accuracy Model Match Provenance Sampled
1 InternLM-XComposer2* 77.2 Imported 2026-05-27
2 Qwen-VL-Max 75.9 Qwen VL Max
qwen-qwen-vl-max
Imported 2026-05-27
3 GPT-4v 73.3 GPT-4
openai-gpt-4
Imported 2026-05-27
4 LLaVA-InternLM2-20B 71.7 Imported 2026-05-27
5 InternLM-XComposer* 71.3 Imported 2026-05-27
6 LLaVA-InternLM2-7B 70.0 Imported 2026-05-27
7 Gemini-Pro-V 69.3 Imported 2026-05-27
8 Qwen-VL-Plus 67.5 Qwen VL Plus
qwen-qwen-vl-plus
Imported 2026-05-27
9 Yi-VL-34B* 67.0 Imported 2026-05-27
10 Yi-VL-6B* 65.3 Imported 2026-05-27
11 Monkey-Chat 65.1 Imported 2026-05-27
12 LLaVA-InternLM-7B 63.0 Imported 2026-05-27
13 MiniCPM-V 63.0 Imported 2026-05-27
14 LLaVA-v1.5-13B 62.5 Imported 2026-05-27
15 ShareGPT4V-13B 62.4 Imported 2026-05-27
16 OmniLMM-12B* 60.6 Imported 2026-05-27
17 ShareGPT4V-7B 59.7 Imported 2026-05-27
18 mPLUG-Owl2 58.1 Imported 2026-05-27
19 Qwen-VL-Chat* 57.6 Imported 2026-05-27
20 LLaVA-v1.5-7B 57.0 Imported 2026-05-27
21 CogVLM-Chat-17B 52.9 Imported 2026-05-27
22 VisualGLM-6B 40.6 Imported 2026-05-27
23 PandaGPT 31.0 Imported 2026-05-27
24 IDEFICS-80B-Instruct 29.2 Imported 2026-05-27
25 IDEFICS-9B-Instruct 18.7 Imported 2026-05-27
26 InstructBLIP-7B 18.1 Imported 2026-05-27
27 InstructBLIP-13B 15.1 Imported 2026-05-27
28 OpenFlamingo v2 14.3 Imported 2026-05-27
29 MiniGPT4-7B 11.9 Imported 2026-05-27
30 MiniGPT4-13B 11.8 Imported 2026-05-27