MathVision
MathVision evaluates multimodal mathematical reasoning on a full 3,040-example visual math test set.
160rows
allprimary metric
2026-05-06sampled
Metadata
Metrics
All, Algebra, Analytic geometry, Arithmetic, Combinatorial geometry, Combinatorics, Counting, Descriptive geometry, Graph theory, Logical reasoning, Angle, Area, Length, Solid geometry, Statistics, Topology, Transformation geometry
| Rank | Subject | All | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.4 (xhigh reasoning, w/ Python) (3rd-party eval) 🥇 | 96.10 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 2 | Gemini 3.1 Pro (thinking high, w/ Python) (3rd-party eval) 🥈 | 95.70 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 3 | Kimi K2.6 (w/ Python) 🥉 | 93.20 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 4 | GPT-5.4 (xhigh reasoning) (3rd-party eval) | 92 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 5 | Gemini 3.1 Pro (thinking high) (3rd-party eval) | 89.80 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 6 | Qwen3.5-397B-A17B | 88.60 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-06 |
| 7 | Kimi K2.6 (no tools) | 87.40 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 8 | Gemini 3 Pro (3rd-party eval) | 86.60 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 9 | Qwen3.5-122B-A10B | 86.20 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Imported | 2026-05-06 |
| 10 | Qwen3.5-27B | 86 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-06 |
| 11 | Gemma 4 31B | 85.60 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-06 |
| 12 | Kimi K2.5 (thinking, w/ Python) | 85 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 13 | Claude Opus 4.6 (max effort, w/ Python) (3rd-party eval) | 84.60 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 14 | Kimi K2.5 | 84.20 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 15 | Qwen3.5-35B-A3B | 83.90 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-06 |
| 16 | GPT-5.2 (3rd-party eval) | 83 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 17 | Gemma 4 26B A4B | 82.40 | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-06 |
| 18 | Seed-1.8 | 81.30 | — | Imported | 2026-05-06 |
| 19 | Seed 1.6-Thinking | 77.20 | — | Imported | 2026-05-06 |
| 20 | Claude Opus 4.5 (3rd-party eval) | 77.10 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 21 | Step3-VL-10B (PaCoRe) | 75.95 | — | Imported | 2026-05-06 |
| 22 | Qwen3-VL-235B-A22B-Thinking | 74.60 | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Imported | 2026-05-06 |
| 23 | Gemini 2.5 Pro | 73.30 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 24 | GPT-5 | 72 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 25 | GPT-5-mini (2025-08-07) (3rd-party eval) | 71.90 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 26 | Claude Opus 4.6 (max effort) (3rd-party eval) | 71.20 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 27 | Claude Sonnet 4.5 (3rd-party eval) | 71.10 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 28 | Step3-VL-10B (SeRe) | 70.81 | — | Imported | 2026-05-06 |
| 29 | dots.vlm1 | 69.64 | — | Imported | 2026-05-06 |
| 30 | Human | 68.82 | — | Imported | 2026-05-06 |
| 31 | Seed1.5-VL | 68.70 | — | Imported | 2026-05-06 |
| 32 | Claude Opus 4.1 (thinking) (3rd-party eval) | 66 | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-06 |
| 33 | Qwen3-VL-235B-A22B-Instruct | 66 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Imported | 2026-05-06 |
| 34 | GLM-4.5V (106B-A12B) | 65.60 | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-06 |
| 35 | Step-3 (321B-A38B) | 64.80 | — | Imported | 2026-05-06 |
| 36 | InternVL3.5 (241B-A28B) | 63.90 | — | Imported | 2026-05-06 |
| 37 | GLM-4.6V (106B-A12B) | 63.50 | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-06 |
| 38 | MiMo-VL-RL | 60.40 | — | Imported | 2026-05-06 |
| 39 | OpenAI o1 | 60.30 | o1 openai-o1 | Imported | 2026-05-06 |
| 40 | MiMo-VL-RL-2508 | 59.65 | — | Imported | 2026-05-06 |
| 41 | Qwen3-VL-8B-Thinking | 59.60 | Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking | Imported | 2026-05-06 |
| 42 | Gemma 4 E4B | 59.50 | — | Imported | 2026-05-06 |
| 43 | Claude 3.7 Sonnet (3rd-party eval, Skywork) | 58.60 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 44 | OpenAI o4-mini (3rd-party eval) | 58 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 45 | MiMo-VL-7B-SFT | 57.90 | — | Imported | 2026-05-06 |
| 46 | Kimi-VL-A3B-Thinking-2506 | 56.90 | — | Imported | 2026-05-06 |
| 47 | Step R1-V-Mini | 56.60 | — | Imported | 2026-05-06 |
| 48 | InternVL3.5 (30B-A3B) | 55.70 | — | Imported | 2026-05-06 |
| 49 | SenseNova V6 Reasoner | 55.39 | — | Imported | 2026-05-06 |
| 50 | GLM-4.1V (9B) | 54.40 | — | Imported | 2026-05-06 |
| 51 | GLM-4.6V-Flash (9B) | 54.05 | — | Imported | 2026-05-06 |
| 52 | InternVL3.5-38B | 54 | — | Imported | 2026-05-06 |
| 53 | AStar-7B (training-free, Qwen2.5-VL-7B) | 53.90 | — | Imported | 2026-05-06 |
| 54 | Ovis2.5-9B | 53.90 | — | Imported | 2026-05-06 |
| 55 | Kimi k1.6 Preview | 53.29 | — | Imported | 2026-05-06 |
| 56 | Decoupled LLM-LMM (Qwen2.5-VL-72B + Qwen3-32B) | 52.60 | — | Imported | 2026-05-06 |
| 57 | Skywork-R1V3-38B | 52.60 | — | Imported | 2026-05-06 |
| 58 | Gemma 4 E2B | 52.40 | — | Imported | 2026-05-06 |
| 59 | InternVL3.5-8B | 52.05 | — | Imported | 2026-05-06 |
| 60 | Open-Vision-Reasoner-7B | 51.80 | — | Imported | 2026-05-06 |
| 61 | Qianfan-VL-70B | 50.29 | — | Imported | 2026-05-06 |
| 62 | Skywork-R1V2-38B | 49.70 | — | Imported | 2026-05-06 |
| 63 | Doubao-1.5-pro | 48.62 | — | Imported | 2026-05-06 |
| 64 | Gemini 2.0 Pro (3rd-party eval) | 48.10 | — | Imported | 2026-05-06 |
| 65 | GPT-4.5 | 47.30 | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-06 |
| 66 | Gemma 3 27B (no think) | 46 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-06 |
| 67 | Keye-VL-8B | 46 | — | Imported | 2026-05-06 |
| 68 | GPT-5 (minimal) (3rd-party eval) | 45.80 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 69 | VL-Rethinker-72B | 44.93 | — | Imported | 2026-05-06 |
| 70 | LLaVA-Critic-R1 (Qwen2.5-VL-7B base) | 44.10 | — | Imported | 2026-05-06 |
| 71 | Vision-R1-7B (contamination-flagged) | 43.90 | — | Imported | 2026-05-06 |
| 72 | InternVL3-78B | 43.10 | — | Imported | 2026-05-06 |
| 73 | Skywork-R1V2-38B-AWQ | 42.90 | — | Imported | 2026-05-06 |
| 74 | INFRL-Qwen2.5-VL-72B-Preview | 42.73 | — | Imported | 2026-05-06 |
| 75 | Gemini-2 Flash | 41.30 | — | Imported | 2026-05-06 |
| 76 | ProxyThinker (Qwen2.5-VL-32B + OpenVLThinker-7B expert) | 40.80 | — | Imported | 2026-05-06 |
| 77 | VL-Rethinker-32B | 40.50 | — | Imported | 2026-05-06 |
| 78 | ViCrit-RL-72B | 40.10 | — | Imported | 2026-05-06 |
| 79 | InternVL3.5-4B | 40 | — | Imported | 2026-05-06 |
| 80 | Kimi k1.5 | 38.60 | — | Imported | 2026-05-06 |
| 81 | Virgo-72B | 38.40 | — | Imported | 2026-05-06 |
| 82 | Qwen2.5-VL-72B | 38.10 | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-06 |
| 83 | Claude3.5-Sonnet | 37.99 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 84 | AVAR-Thinker-7B | 37.40 | — | Imported | 2026-05-06 |
| 85 | Ovis2.5-2B | 37.40 | — | Imported | 2026-05-06 |
| 86 | InternVL3-14B | 37 | — | Imported | 2026-05-06 |
| 87 | Kimi-VL-A3B-Thinking | 36.80 | — | Imported | 2026-05-06 |
| 88 | Phi-4-reasoning-vision-15B (testmini) | 36.20 | — | Imported | 2026-05-06 |
| 89 | QvQ-72B-Preview | 35.90 | — | Imported | 2026-05-06 |
| 90 | URSA-8B + URSA-RM (BoN=32) | 35.10 | — | Imported | 2026-05-06 |
| 91 | InternVL3-38B | 34.50 | — | Imported | 2026-05-06 |
| 92 | ThinkLite-VL-7B | 32.90 | — | Imported | 2026-05-06 |
| 93 | Qianfan-VL-8B | 32.82 | — | Imported | 2026-05-06 |
| 94 | InternVL2.5-78B | 32.20 | — | Imported | 2026-05-06 |
| 95 | Ovis2-34B | 31.90 | — | Imported | 2026-05-06 |
| 96 | InternVL2.5-38B | 31.80 | — | Imported | 2026-05-06 |
| 97 | URSA-8B-PS-GRPO | 31.50 | — | Imported | 2026-05-06 |
| 98 | TBAC-VLR1-7B | 31.40 | — | Imported | 2026-05-06 |
| 99 | LLaVA-Critic-R1 (LLaMA-3.2-11B-V base) | 30.90 | — | Imported | 2026-05-06 |
| 100 | GPT-4o | 30.39 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 101 | GPT-4 Turbo | 30.26 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 102 | DualMindVLM-7B | 30.20 | — | Imported | 2026-05-06 |
| 103 | MMR1-Math-v0-7B | 30.20 | — | Imported | 2026-05-06 |
| 104 | Ovis2-16B | 30.10 | — | Imported | 2026-05-06 |
| 105 | R1-Onevision-7B | 29.90 | — | Imported | 2026-05-06 |
| 106 | InternVL3-8B | 29 | — | Imported | 2026-05-06 |
| 107 | VOLD (Qwen2.5-VL-3B, text-only RL) | 28 | — | Imported | 2026-05-06 |
| 108 | InternVL3-9B | 27.60 | — | Imported | 2026-05-06 |
| 109 | Claude3-Opus | 27.13 | — | Imported | 2026-05-06 |
| 110 | MM-Eureka-Qwen-7B | 26.90 | — | Imported | 2026-05-06 |
| 111 | Qwen2.5-VL-DP-7B (MathV-DP) | 26.90 | — | Imported | 2026-05-06 |
| 112 | Vision-SR1-7B | 26.70 | — | Imported | 2026-05-06 |
| 113 | MathGLM-Vision-32B | 26.50 | — | Imported | 2026-05-06 |
| 114 | VLAA-Thinker-Qwen2.5VL-7B | 26.40 | — | Imported | 2026-05-06 |
| 115 | MathCoder-VL-8B | 26.10 | — | Imported | 2026-05-06 |
| 116 | OpenVLThinker-7B | 25.90 | — | Imported | 2026-05-06 |
| 117 | Ovis2-8B (testmini) | 25.90 | — | Imported | 2026-05-06 |
| 118 | Qwen2-VL-72B | 25.90 | — | Imported | 2026-05-06 |
| 119 | LLaVA-OneVision-72B | 25.30 | — | Imported | 2026-05-06 |
| 120 | Qwen2.5-VL-7B | 25.10 | — | Imported | 2026-05-06 |
| 121 | TBAC-VLR1-3B-preview | 25 | — | Imported | 2026-05-06 |
| 122 | R1-VL-7B (contamination-flagged) | 24.70 | — | Imported | 2026-05-06 |
| 123 | X-Reasoner-3B (text-only) | 24.40 | — | Imported | 2026-05-06 |
| 124 | CoT GPT4V | 23.98 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 125 | R1-Onevision-3B | 23.60 | — | Imported | 2026-05-06 |
| 126 | MiniCPM-V 2.6 | 23.40 | — | Imported | 2026-05-06 |
| 127 | InternVL2.5-26B | 23.10 | — | Imported | 2026-05-06 |
| 128 | GPT4V | 22.76 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 129 | InternVL3-2B | 21.70 | — | Imported | 2026-05-06 |
| 130 | MathCoder-VL-2B | 21.70 | — | Imported | 2026-05-06 |
| 131 | MiniCPM-o 2.6 | 21.70 | — | Imported | 2026-05-06 |
| 132 | Kimi-VL (base, non-thinking) | 21.40 | — | Imported | 2026-05-06 |
| 133 | Qwen2.5-VL-3B | 21.20 | — | Imported | 2026-05-06 |
| 134 | Multimath-7B | 20.70 | — | Imported | 2026-05-06 |
| 135 | InternVL2.5-8B | 19.70 | — | Imported | 2026-05-06 |
| 136 | Mulberry-Qwen2VL-7B (contamination-flagged) | 19.50 | — | Imported | 2026-05-06 |
| 137 | Gemini-1.5 Pro | 19.24 | — | Imported | 2026-05-06 |
| 138 | InternVL3-1B | 18.80 | — | Imported | 2026-05-06 |
| 139 | Ovis1.6-Gemma2-9B | 18.78 | — | Imported | 2026-05-06 |
| 140 | MAVIS-7B | 18.60 | — | Imported | 2026-05-06 |
| 141 | Aquila-VL-2B | 17.90 | — | Imported | 2026-05-06 |
| 142 | Ovis2-2B (testmini) | 17.70 | — | Imported | 2026-05-06 |
| 143 | Qwen2-VL-DP-7B (MathV-DP) | 17.70 | — | Imported | 2026-05-06 |
| 144 | Gemini Pro | 17.66 | — | Imported | 2026-05-06 |
| 145 | InternVL-Chat-V1-2-Plus | 16.97 | — | Imported | 2026-05-06 |
| 146 | Qwen2-VL-7B | 16.30 | — | Imported | 2026-05-06 |
| 147 | Math-LLaVA-13B | 15.69 | — | Imported | 2026-05-06 |
| 148 | Qwen-VL-Max | 15.59 | Qwen VL Max qwen-qwen-vl-max | Imported | 2026-05-06 |
| 149 | InternVL2.5-2B | 14.70 | — | Imported | 2026-05-06 |
| 150 | InternLM-XComposer2-VL | 14.54 | — | Imported | 2026-05-06 |
| 151 | InternVL2.5-1B | 14.40 | — | Imported | 2026-05-06 |
| 152 | GPT 4-CoT (caption) | 13.10 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 153 | Qwen2-VL-2B | 12.40 | — | Imported | 2026-05-06 |
| 154 | ShareGPT4V-13B | 11.88 | — | Imported | 2026-05-06 |
| 155 | LLaVA-v1.5-13B | 11.12 | — | Imported | 2026-05-06 |
| 156 | Qwen-VL-Plus | 10.72 | Qwen VL Plus qwen-qwen-vl-plus | Imported | 2026-05-06 |
| 157 | ShareGPT4V-7B | 10.53 | — | Imported | 2026-05-06 |
| 158 | SPHINX (V2) | 9.70 | — | Imported | 2026-05-06 |
| 159 | LLaVA-v1.5-7B | 8.52 | — | Imported | 2026-05-06 |
| 160 | Random Chance | 7.17 | — | Imported | 2026-05-06 |
No matching rows.