MathVista

MathVista evaluates mathematical reasoning over visual contexts, including figure QA, geometry, word problems, textbook QA, and visual QA.

80rows
allprimary metric
2026-05-06sampled

Metadata

Metrics

All, Figure QA, Geometry problem solving, Math word problem, Textbook QA, Visual QA, Algebra, Arithmetic, Geometry, Logical reasoning, Numeric commonsense, Scientific reasoning, Statistical reasoning

Latest Results

Rows ranked by ALL score. Human Performance and Random Chance are preserved as baseline rows.

Rank Subject All Model Match Provenance Sampled
1 DreamPRM (o4-mini) 🥇 85.20 — Imported 2026-05-06
2 VL-Rethinker 🥈 80.30 — Imported 2026-05-06
3 Step R1-V-Mini 🥉 80.10 — Imported 2026-05-06
4 Kimi-k1.6-preview-20250308 80 — Imported 2026-05-06
5 Doubao-pro-1.5 79.50 — Imported 2026-05-06
6 Ovis2_34B 77.10 — Imported 2026-05-06
7 Kimi-k1.5 74.90 — Imported 2026-05-06
8 OpenAI o1 73.90 o1
openai-o1
Imported 2026-05-06
9 Llama 4 Maverick 73.70 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
10 Vision-R1-7B 73.20 — Imported 2026-05-06
11 Gemini 2.0 Flash 73.10 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-06
12 QVQ-72B-Preview 71.40 — Imported 2026-05-06
13 Qwen2VL-72B 70.50 — Imported 2026-05-06
14 Pixtral Large (124B) 69.40 — Imported 2026-05-06
15 Grok-2 69 — Imported 2026-05-06
16 Grok-2 mini 68.10 — Imported 2026-05-06
17 Claude 3.5 Sonnet 67.70 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
18 LLaVA-OneVision 67.50 — Imported 2026-05-06
19 InternVL2-8B-MPO 67.30 — Imported 2026-05-06
20 Ovis1.6-Gemma2-9B 67.20 — Imported 2026-05-06
21 InternVL2-Pro 66.80 — Imported 2026-05-06
22 Chimera-Reasoner-8B 66.20 — Imported 2026-05-06
23 TextGrad (GPT-4o) 66.10 GPT-4o
openai-gpt-4o
Imported 2026-05-06
24 TBAC-VLR1-3B-preview 64.80 — Imported 2026-05-06
25 Gemini 1.5 Pro (May 2024) 63.90 — Imported 2026-05-06
26 GPT-4o 63.80 GPT-4o
openai-gpt-4o
Imported 2026-05-06
27 Phi-4-multimodal-ins 62.40 — Imported 2026-05-06
28 Human Performance* 60.30 — Imported 2026-05-06
29 InternVL-Chat-V1.2-Plus 59.90 — Imported 2026-05-06
30 Gemini 1.5 Flash (May 2024) 58.40 — Imported 2026-05-06
31 GPT-4T 2024-04-09 58.10 GPT-4
openai-gpt-4
Imported 2026-05-06
32 Pixtral-12B 58 — Imported 2026-05-06
33 InternLM-XComposer2-VL-7B 57.60 — Imported 2026-05-06
34 Gemini 1.0 Ultra 53 — Imported 2026-05-06
35 Grok-1.5V 52.80 — Imported 2026-05-06
36 Gemini 1.5 Pro (Feb 2024) 52.10 — Imported 2026-05-06
37 Claude 3 Opus 50.50 — Imported 2026-05-06
38 GPT-4V (Playground) 49.90 GPT-4
openai-gpt-4
Imported 2026-05-06
39 Claude 3 Sonnet 47.90 — Imported 2026-05-06
40 InternVL-Chat-V1.2 47.70 — Imported 2026-05-06
41 Math-LLaVA-13B 46.60 — Imported 2026-05-06
42 LLaVA-NeXT-34B 46.50 — Imported 2026-05-06
43 Claude 3 Haiku 46.40 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-06
44 Gemini 1.0 Pro 45.20 — Imported 2026-05-06
45 Phi-3-Vision-128K-In 44.50 — Imported 2026-05-06
46 Phi-3.5-Vision 4.2B 43.90 — Imported 2026-05-06
47 Mini-Gemini-HD (Hermes-2-Yi-34B) 43.30 — Imported 2026-05-06
48 Qwen-VL-Plus 43.30 Qwen VL Plus
qwen-qwen-vl-plus
Imported 2026-05-06
49 SPHINX-MoE 42.30 — Imported 2026-05-06
50 Mini-Gemini (Mixtral-8x7B) 41.80 — Imported 2026-05-06
51 MM1-7B-MoE-Chat 40.90 — Imported 2026-05-06
52 MiniCPM-V-2 (2.8B) 40.60 — Imported 2026-05-06
53 MM1-30B-Chat 39.40 — Imported 2026-05-06
54 SPHINX-Plus 36.80 — Imported 2026-05-06
55 SPHINX (V2) 36.70 — Imported 2026-05-06
56 MM1-7B-Chat 35.90 — Imported 2026-05-06
57 SPHINX-Intern2 35.50 — Imported 2026-05-06
58 OmniLMM-12B 34.90 — Imported 2026-05-06
59 Multimodal Bard 34.80 — Imported 2026-05-06
60 LLaVA-NeXT-Vicuna-7B 34.60 — Imported 2026-05-06
61 PoT GPT-4 (Caption+OCR) 33.90 GPT-4
openai-gpt-4
Imported 2026-05-06
62 CoT ChatGPT (Caption+OCR) 33.20 — Imported 2026-05-06
63 CoT Claude (Caption+OCR) 33.20 — Imported 2026-05-06
64 CoT GPT4 (Caption+OCR) 33.20 GPT-4
openai-gpt-4
Imported 2026-05-06
65 MM1-3B-MoE-Chat 32.60 — Imported 2026-05-06
66 MM1-3B-Chat 32 — Imported 2026-05-06
67 Gemini 1.0 Nano 2 30.60 — Imported 2026-05-06
68 LLaVA-1.5-13B 27.60 — Imported 2026-05-06
69 SPHINX (V1) 27.50 — Imported 2026-05-06
70 Gemini 1.0 Nano 1 27.30 — Imported 2026-05-06
71 PoT ChatGPT (Caption+OCR) 26.80 — Imported 2026-05-06
72 SPHINX-Tiny 26.40 — Imported 2026-05-06
73 LLaVA (LLaMA-2-13B) 26.10 — Imported 2026-05-06
74 InstructBLIP (Vicuna-7B) 25.30 — Imported 2026-05-06
75 LLaVAR 25.20 — Imported 2026-05-06
76 LLaMA-Adapter-V2 (7B) 23.90 — Imported 2026-05-06
77 miniGPT4 (LLaMA-2-7B) 23.10 — Imported 2026-05-06
78 mPLUG-Owl (LLaMA-7B) 22.20 — Imported 2026-05-06
79 IDEFICS (9B-Instruct) 19.80 — Imported 2026-05-06
80 Random Chance 17.90 — Imported 2026-05-06