MMVet

MM-Vet is an evaluation benchmark that examines large multimodal models on complicated multimodal tasks requiring integrated capabilities. It assesses six core vision-language capabilities: recognition, knowledge, spatial awareness, language generation, OCR, and math through questions that require one or more of these capabilities.

2rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Qwen2.5 VL 72B Instruct 0.76 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Self-reported 2026-05-06
2 Qwen2.5 VL 7B Instruct 0.67 Self-reported 2026-05-06