MathVision

MathVision evaluates multimodal mathematical reasoning on a full 3,040-example visual math test set.

160rows
allprimary metric
2026-05-06sampled

Metadata

Metrics

All, Algebra, Analytic geometry, Arithmetic, Combinatorial geometry, Combinatorics, Counting, Descriptive geometry, Graph theory, Logical reasoning, Angle, Area, Length, Solid geometry, Statistics, Topology, Transformation geometry

Latest Results

Rows are from the main leaderboard for the full 3,040-example MATH-Vision test set.

Rank Subject All Model Match Provenance Sampled
1 GPT-5.4 (xhigh reasoning, w/ Python) (3rd-party eval) 🥇 96.10 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
2 Gemini 3.1 Pro (thinking high, w/ Python) (3rd-party eval) 🥈 95.70 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
3 Kimi K2.6 (w/ Python) 🥉 93.20 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-06
4 GPT-5.4 (xhigh reasoning) (3rd-party eval) 92 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
5 Gemini 3.1 Pro (thinking high) (3rd-party eval) 89.80 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
6 Qwen3.5-397B-A17B 88.60 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-06
7 Kimi K2.6 (no tools) 87.40 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-06
8 Gemini 3 Pro (3rd-party eval) 86.60 Gemini 3
google-gemini-3
Imported 2026-05-06
9 Qwen3.5-122B-A10B 86.20 Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-06
10 Qwen3.5-27B 86 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-06
11 Gemma 4 31B 85.60 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-06
12 Kimi K2.5 (thinking, w/ Python) 85 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
13 Claude Opus 4.6 (max effort, w/ Python) (3rd-party eval) 84.60 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
14 Kimi K2.5 84.20 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
15 Qwen3.5-35B-A3B 83.90 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-06
16 GPT-5.2 (3rd-party eval) 83 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
17 Gemma 4 26B A4B 82.40 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-06
18 Seed-1.8 81.30 — Imported 2026-05-06
19 Seed 1.6-Thinking 77.20 — Imported 2026-05-06
20 Claude Opus 4.5 (3rd-party eval) 77.10 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
21 Step3-VL-10B (PaCoRe) 75.95 — Imported 2026-05-06
22 Qwen3-VL-235B-A22B-Thinking 74.60 Qwen3 VL 235B A22B Thinking
qwen-qwen3-vl-235b-a22b-thinking
Imported 2026-05-06
23 Gemini 2.5 Pro 73.30 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
24 GPT-5 72 GPT-5
openai-gpt-5
Imported 2026-05-06
25 GPT-5-mini (2025-08-07) (3rd-party eval) 71.90 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
26 Claude Opus 4.6 (max effort) (3rd-party eval) 71.20 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
27 Claude Sonnet 4.5 (3rd-party eval) 71.10 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
28 Step3-VL-10B (SeRe) 70.81 — Imported 2026-05-06
29 dots.vlm1 69.64 — Imported 2026-05-06
30 Human 68.82 — Imported 2026-05-06
31 Seed1.5-VL 68.70 — Imported 2026-05-06
32 Claude Opus 4.1 (thinking) (3rd-party eval) 66 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
33 Qwen3-VL-235B-A22B-Instruct 66 Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-06
34 GLM-4.5V (106B-A12B) 65.60 GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-06
35 Step-3 (321B-A38B) 64.80 — Imported 2026-05-06
36 InternVL3.5 (241B-A28B) 63.90 — Imported 2026-05-06
37 GLM-4.6V (106B-A12B) 63.50 GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-06
38 MiMo-VL-RL 60.40 — Imported 2026-05-06
39 OpenAI o1 60.30 o1
openai-o1
Imported 2026-05-06
40 MiMo-VL-RL-2508 59.65 — Imported 2026-05-06
41 Qwen3-VL-8B-Thinking 59.60 Qwen3 VL 8B Thinking
qwen-qwen3-vl-8b-thinking
Imported 2026-05-06
42 Gemma 4 E4B 59.50 — Imported 2026-05-06
43 Claude 3.7 Sonnet (3rd-party eval, Skywork) 58.60 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
44 OpenAI o4-mini (3rd-party eval) 58 o4 Mini
openai-o4-mini
Imported 2026-05-06
45 MiMo-VL-7B-SFT 57.90 — Imported 2026-05-06
46 Kimi-VL-A3B-Thinking-2506 56.90 — Imported 2026-05-06
47 Step R1-V-Mini 56.60 — Imported 2026-05-06
48 InternVL3.5 (30B-A3B) 55.70 — Imported 2026-05-06
49 SenseNova V6 Reasoner 55.39 — Imported 2026-05-06
50 GLM-4.1V (9B) 54.40 — Imported 2026-05-06
51 GLM-4.6V-Flash (9B) 54.05 — Imported 2026-05-06
52 InternVL3.5-38B 54 — Imported 2026-05-06
53 AStar-7B (training-free, Qwen2.5-VL-7B) 53.90 — Imported 2026-05-06
54 Ovis2.5-9B 53.90 — Imported 2026-05-06
55 Kimi k1.6 Preview 53.29 — Imported 2026-05-06
56 Decoupled LLM-LMM (Qwen2.5-VL-72B + Qwen3-32B) 52.60 — Imported 2026-05-06
57 Skywork-R1V3-38B 52.60 — Imported 2026-05-06
58 Gemma 4 E2B 52.40 — Imported 2026-05-06
59 InternVL3.5-8B 52.05 — Imported 2026-05-06
60 Open-Vision-Reasoner-7B 51.80 — Imported 2026-05-06
61 Qianfan-VL-70B 50.29 — Imported 2026-05-06
62 Skywork-R1V2-38B 49.70 — Imported 2026-05-06
63 Doubao-1.5-pro 48.62 — Imported 2026-05-06
64 Gemini 2.0 Pro (3rd-party eval) 48.10 — Imported 2026-05-06
65 GPT-4.5 47.30 GPT-4.5
openai-gpt-4.5-preview
Imported 2026-05-06
66 Gemma 3 27B (no think) 46 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-06
67 Keye-VL-8B 46 — Imported 2026-05-06
68 GPT-5 (minimal) (3rd-party eval) 45.80 GPT-5
openai-gpt-5
Imported 2026-05-06
69 VL-Rethinker-72B 44.93 — Imported 2026-05-06
70 LLaVA-Critic-R1 (Qwen2.5-VL-7B base) 44.10 — Imported 2026-05-06
71 Vision-R1-7B (contamination-flagged) 43.90 — Imported 2026-05-06
72 InternVL3-78B 43.10 — Imported 2026-05-06
73 Skywork-R1V2-38B-AWQ 42.90 — Imported 2026-05-06
74 INFRL-Qwen2.5-VL-72B-Preview 42.73 — Imported 2026-05-06
75 Gemini-2 Flash 41.30 — Imported 2026-05-06
76 ProxyThinker (Qwen2.5-VL-32B + OpenVLThinker-7B expert) 40.80 — Imported 2026-05-06
77 VL-Rethinker-32B 40.50 — Imported 2026-05-06
78 ViCrit-RL-72B 40.10 — Imported 2026-05-06
79 InternVL3.5-4B 40 — Imported 2026-05-06
80 Kimi k1.5 38.60 — Imported 2026-05-06
81 Virgo-72B 38.40 — Imported 2026-05-06
82 Qwen2.5-VL-72B 38.10 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-06
83 Claude3.5-Sonnet 37.99 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
84 AVAR-Thinker-7B 37.40 — Imported 2026-05-06
85 Ovis2.5-2B 37.40 — Imported 2026-05-06
86 InternVL3-14B 37 — Imported 2026-05-06
87 Kimi-VL-A3B-Thinking 36.80 — Imported 2026-05-06
88 Phi-4-reasoning-vision-15B (testmini) 36.20 — Imported 2026-05-06
89 QvQ-72B-Preview 35.90 — Imported 2026-05-06
90 URSA-8B + URSA-RM (BoN=32) 35.10 — Imported 2026-05-06
91 InternVL3-38B 34.50 — Imported 2026-05-06
92 ThinkLite-VL-7B 32.90 — Imported 2026-05-06
93 Qianfan-VL-8B 32.82 — Imported 2026-05-06
94 InternVL2.5-78B 32.20 — Imported 2026-05-06
95 Ovis2-34B 31.90 — Imported 2026-05-06
96 InternVL2.5-38B 31.80 — Imported 2026-05-06
97 URSA-8B-PS-GRPO 31.50 — Imported 2026-05-06
98 TBAC-VLR1-7B 31.40 — Imported 2026-05-06
99 LLaVA-Critic-R1 (LLaMA-3.2-11B-V base) 30.90 — Imported 2026-05-06
100 GPT-4o 30.39 GPT-4o
openai-gpt-4o
Imported 2026-05-06
101 GPT-4 Turbo 30.26 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
102 DualMindVLM-7B 30.20 — Imported 2026-05-06
103 MMR1-Math-v0-7B 30.20 — Imported 2026-05-06
104 Ovis2-16B 30.10 — Imported 2026-05-06
105 R1-Onevision-7B 29.90 — Imported 2026-05-06
106 InternVL3-8B 29 — Imported 2026-05-06
107 VOLD (Qwen2.5-VL-3B, text-only RL) 28 — Imported 2026-05-06
108 InternVL3-9B 27.60 — Imported 2026-05-06
109 Claude3-Opus 27.13 — Imported 2026-05-06
110 MM-Eureka-Qwen-7B 26.90 — Imported 2026-05-06
111 Qwen2.5-VL-DP-7B (MathV-DP) 26.90 — Imported 2026-05-06
112 Vision-SR1-7B 26.70 — Imported 2026-05-06
113 MathGLM-Vision-32B 26.50 — Imported 2026-05-06
114 VLAA-Thinker-Qwen2.5VL-7B 26.40 — Imported 2026-05-06
115 MathCoder-VL-8B 26.10 — Imported 2026-05-06
116 OpenVLThinker-7B 25.90 — Imported 2026-05-06
117 Ovis2-8B (testmini) 25.90 — Imported 2026-05-06
118 Qwen2-VL-72B 25.90 — Imported 2026-05-06
119 LLaVA-OneVision-72B 25.30 — Imported 2026-05-06
120 Qwen2.5-VL-7B 25.10 — Imported 2026-05-06
121 TBAC-VLR1-3B-preview 25 — Imported 2026-05-06
122 R1-VL-7B (contamination-flagged) 24.70 — Imported 2026-05-06
123 X-Reasoner-3B (text-only) 24.40 — Imported 2026-05-06
124 CoT GPT4V 23.98 GPT-4
openai-gpt-4
Imported 2026-05-06
125 R1-Onevision-3B 23.60 — Imported 2026-05-06
126 MiniCPM-V 2.6 23.40 — Imported 2026-05-06
127 InternVL2.5-26B 23.10 — Imported 2026-05-06
128 GPT4V 22.76 GPT-4
openai-gpt-4
Imported 2026-05-06
129 InternVL3-2B 21.70 — Imported 2026-05-06
130 MathCoder-VL-2B 21.70 — Imported 2026-05-06
131 MiniCPM-o 2.6 21.70 — Imported 2026-05-06
132 Kimi-VL (base, non-thinking) 21.40 — Imported 2026-05-06
133 Qwen2.5-VL-3B 21.20 — Imported 2026-05-06
134 Multimath-7B 20.70 — Imported 2026-05-06
135 InternVL2.5-8B 19.70 — Imported 2026-05-06
136 Mulberry-Qwen2VL-7B (contamination-flagged) 19.50 — Imported 2026-05-06
137 Gemini-1.5 Pro 19.24 — Imported 2026-05-06
138 InternVL3-1B 18.80 — Imported 2026-05-06
139 Ovis1.6-Gemma2-9B 18.78 — Imported 2026-05-06
140 MAVIS-7B 18.60 — Imported 2026-05-06
141 Aquila-VL-2B 17.90 — Imported 2026-05-06
142 Ovis2-2B (testmini) 17.70 — Imported 2026-05-06
143 Qwen2-VL-DP-7B (MathV-DP) 17.70 — Imported 2026-05-06
144 Gemini Pro 17.66 — Imported 2026-05-06
145 InternVL-Chat-V1-2-Plus 16.97 — Imported 2026-05-06
146 Qwen2-VL-7B 16.30 — Imported 2026-05-06
147 Math-LLaVA-13B 15.69 — Imported 2026-05-06
148 Qwen-VL-Max 15.59 Qwen VL Max
qwen-qwen-vl-max
Imported 2026-05-06
149 InternVL2.5-2B 14.70 — Imported 2026-05-06
150 InternLM-XComposer2-VL 14.54 — Imported 2026-05-06
151 InternVL2.5-1B 14.40 — Imported 2026-05-06
152 GPT 4-CoT (caption) 13.10 GPT-4
openai-gpt-4
Imported 2026-05-06
153 Qwen2-VL-2B 12.40 — Imported 2026-05-06
154 ShareGPT4V-13B 11.88 — Imported 2026-05-06
155 LLaVA-v1.5-13B 11.12 — Imported 2026-05-06
156 Qwen-VL-Plus 10.72 Qwen VL Plus
qwen-qwen-vl-plus
Imported 2026-05-06
157 ShareGPT4V-7B 10.53 — Imported 2026-05-06
158 SPHINX (V2) 9.70 — Imported 2026-05-06
159 LLaVA-v1.5-7B 8.52 — Imported 2026-05-06
160 Random Chance 7.17 — Imported 2026-05-06