Math-VR

Mathematical visual reasoning benchmark for VLMs, unified models, and LLMs, reporting answer correctness and process scores on text and multimodal questions.

31rows
overall_answer_correctnessprimary metric
2026-05-27sampled

Metadata

Metrics

Overall Answer Correctness, Overall Process Score, Text Answer Correctness, Text Process Score, Multimodal Answer Correctness, Multimodal Process Score

Latest Results

Rows parsed from the Math-VR public leaderboard tables. Math-VR measures mathematical visual reasoning with answer correctness and process score over text and multimodal questions.

Rank Subject Overall Answer Correctness Model Match Provenance Sampled
1 Qwen3-VL-235B-A22B-Thinking 66.8 Qwen3 VL 235B A22B Thinking
qwen-qwen3-vl-235b-a22b-thinking
Imported 2026-05-27
2 Qwen3-VL-235B-A22B-Instruct 65.0 Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-27
3 Gemini-2.5-Pro 64.7 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27
4 Gemini-2.5-Flash 60.5 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
5 GPT-o3 59.3 Imported 2026-05-27
6 Seed-1.6-Thinking 58.4 Imported 2026-05-27
7 GPT-5-Thinking 58.1 GPT-5
openai-gpt-5
Imported 2026-05-27
8 Claude Opus4.1 54.3 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-27
9 Nano Banana 53.4 Nano Banana (Gemini 2.5 Flash Image)
google-gemini-2.5-flash-image
Imported 2026-05-27
10 Gemini-2.5-Flash-No-Thinking 52.3 Imported 2026-05-27
11 GLM-4.5V 49.6 GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-27
12 Deepseek-R1 49.5 R1
deepseek-r1
Imported 2026-05-27
13 Mimo-VL-7B-RL 48.3 Imported 2026-05-27
14 InternVL-3.5-8B 40.8 Imported 2026-05-27
15 GPT-4.1-mini 33.3 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-27
16 GLM-4.1V-9B 29.0 Imported 2026-05-27
17 Claude-Sonnet-4 28.1 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-27
18 GPT-4.1 26.0 GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
19 CodePlot-CoT 22.1 Imported 2026-05-27
20 Gemini-2.0-Flash 20.6 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-27
21 Keye-VL-1.5 17.3 Imported 2026-05-27
22 Gemma3 16.1 Imported 2026-05-27
23 Qwen-2.5-VL-72B 13.7 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-27
24 Bagel-Zebra-CoT 10.1 Imported 2026-05-27
25 Qwen-2.5-VL-32B 10.0 Imported 2026-05-27
26 GPT-4.1-nano 9.1 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-27
27 InternVL-3.5-8B-No-Thinking 7.9 Imported 2026-05-27
28 Bagel 7.6 Imported 2026-05-27
29 Qwen-2.5-VL-3B 5.3 Imported 2026-05-27
30 GPT-4o 4.3 GPT-4o
openai-gpt-4o
Imported 2026-05-27
31 Qwen-2.5-VL-7B 3.0 Imported 2026-05-27