MathVista
MathVista evaluates mathematical reasoning over visual contexts, including figure QA, geometry, word problems, textbook QA, and visual QA.
80rows
allprimary metric
2026-05-06sampled
Metadata
Metrics
All, Figure QA, Geometry problem solving, Math word problem, Textbook QA, Visual QA, Algebra, Arithmetic, Geometry, Logical reasoning, Numeric commonsense, Scientific reasoning, Statistical reasoning
| Rank | Subject | All | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | DreamPRM (o4-mini) 🥇 | 85.20 | — | Imported | 2026-05-06 |
| 2 | VL-Rethinker 🥈 | 80.30 | — | Imported | 2026-05-06 |
| 3 | Step R1-V-Mini 🥉 | 80.10 | — | Imported | 2026-05-06 |
| 4 | Kimi-k1.6-preview-20250308 | 80 | — | Imported | 2026-05-06 |
| 5 | Doubao-pro-1.5 | 79.50 | — | Imported | 2026-05-06 |
| 6 | Ovis2_34B | 77.10 | — | Imported | 2026-05-06 |
| 7 | Kimi-k1.5 | 74.90 | — | Imported | 2026-05-06 |
| 8 | OpenAI o1 | 73.90 | o1 openai-o1 | Imported | 2026-05-06 |
| 9 | Llama 4 Maverick | 73.70 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 10 | Vision-R1-7B | 73.20 | — | Imported | 2026-05-06 |
| 11 | Gemini 2.0 Flash | 73.10 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-06 |
| 12 | QVQ-72B-Preview | 71.40 | — | Imported | 2026-05-06 |
| 13 | Qwen2VL-72B | 70.50 | — | Imported | 2026-05-06 |
| 14 | Pixtral Large (124B) | 69.40 | — | Imported | 2026-05-06 |
| 15 | Grok-2 | 69 | — | Imported | 2026-05-06 |
| 16 | Grok-2 mini | 68.10 | — | Imported | 2026-05-06 |
| 17 | Claude 3.5 Sonnet | 67.70 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 18 | LLaVA-OneVision | 67.50 | — | Imported | 2026-05-06 |
| 19 | InternVL2-8B-MPO | 67.30 | — | Imported | 2026-05-06 |
| 20 | Ovis1.6-Gemma2-9B | 67.20 | — | Imported | 2026-05-06 |
| 21 | InternVL2-Pro | 66.80 | — | Imported | 2026-05-06 |
| 22 | Chimera-Reasoner-8B | 66.20 | — | Imported | 2026-05-06 |
| 23 | TextGrad (GPT-4o) | 66.10 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 24 | TBAC-VLR1-3B-preview | 64.80 | — | Imported | 2026-05-06 |
| 25 | Gemini 1.5 Pro (May 2024) | 63.90 | — | Imported | 2026-05-06 |
| 26 | GPT-4o | 63.80 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 27 | Phi-4-multimodal-ins | 62.40 | — | Imported | 2026-05-06 |
| 28 | Human Performance* | 60.30 | — | Imported | 2026-05-06 |
| 29 | InternVL-Chat-V1.2-Plus | 59.90 | — | Imported | 2026-05-06 |
| 30 | Gemini 1.5 Flash (May 2024) | 58.40 | — | Imported | 2026-05-06 |
| 31 | GPT-4T 2024-04-09 | 58.10 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 32 | Pixtral-12B | 58 | — | Imported | 2026-05-06 |
| 33 | InternLM-XComposer2-VL-7B | 57.60 | — | Imported | 2026-05-06 |
| 34 | Gemini 1.0 Ultra | 53 | — | Imported | 2026-05-06 |
| 35 | Grok-1.5V | 52.80 | — | Imported | 2026-05-06 |
| 36 | Gemini 1.5 Pro (Feb 2024) | 52.10 | — | Imported | 2026-05-06 |
| 37 | Claude 3 Opus | 50.50 | — | Imported | 2026-05-06 |
| 38 | GPT-4V (Playground) | 49.90 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 39 | Claude 3 Sonnet | 47.90 | — | Imported | 2026-05-06 |
| 40 | InternVL-Chat-V1.2 | 47.70 | — | Imported | 2026-05-06 |
| 41 | Math-LLaVA-13B | 46.60 | — | Imported | 2026-05-06 |
| 42 | LLaVA-NeXT-34B | 46.50 | — | Imported | 2026-05-06 |
| 43 | Claude 3 Haiku | 46.40 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-06 |
| 44 | Gemini 1.0 Pro | 45.20 | — | Imported | 2026-05-06 |
| 45 | Phi-3-Vision-128K-In | 44.50 | — | Imported | 2026-05-06 |
| 46 | Phi-3.5-Vision 4.2B | 43.90 | — | Imported | 2026-05-06 |
| 47 | Mini-Gemini-HD (Hermes-2-Yi-34B) | 43.30 | — | Imported | 2026-05-06 |
| 48 | Qwen-VL-Plus | 43.30 | Qwen VL Plus qwen-qwen-vl-plus | Imported | 2026-05-06 |
| 49 | SPHINX-MoE | 42.30 | — | Imported | 2026-05-06 |
| 50 | Mini-Gemini (Mixtral-8x7B) | 41.80 | — | Imported | 2026-05-06 |
| 51 | MM1-7B-MoE-Chat | 40.90 | — | Imported | 2026-05-06 |
| 52 | MiniCPM-V-2 (2.8B) | 40.60 | — | Imported | 2026-05-06 |
| 53 | MM1-30B-Chat | 39.40 | — | Imported | 2026-05-06 |
| 54 | SPHINX-Plus | 36.80 | — | Imported | 2026-05-06 |
| 55 | SPHINX (V2) | 36.70 | — | Imported | 2026-05-06 |
| 56 | MM1-7B-Chat | 35.90 | — | Imported | 2026-05-06 |
| 57 | SPHINX-Intern2 | 35.50 | — | Imported | 2026-05-06 |
| 58 | OmniLMM-12B | 34.90 | — | Imported | 2026-05-06 |
| 59 | Multimodal Bard | 34.80 | — | Imported | 2026-05-06 |
| 60 | LLaVA-NeXT-Vicuna-7B | 34.60 | — | Imported | 2026-05-06 |
| 61 | PoT GPT-4 (Caption+OCR) | 33.90 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 62 | CoT ChatGPT (Caption+OCR) | 33.20 | — | Imported | 2026-05-06 |
| 63 | CoT Claude (Caption+OCR) | 33.20 | — | Imported | 2026-05-06 |
| 64 | CoT GPT4 (Caption+OCR) | 33.20 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 65 | MM1-3B-MoE-Chat | 32.60 | — | Imported | 2026-05-06 |
| 66 | MM1-3B-Chat | 32 | — | Imported | 2026-05-06 |
| 67 | Gemini 1.0 Nano 2 | 30.60 | — | Imported | 2026-05-06 |
| 68 | LLaVA-1.5-13B | 27.60 | — | Imported | 2026-05-06 |
| 69 | SPHINX (V1) | 27.50 | — | Imported | 2026-05-06 |
| 70 | Gemini 1.0 Nano 1 | 27.30 | — | Imported | 2026-05-06 |
| 71 | PoT ChatGPT (Caption+OCR) | 26.80 | — | Imported | 2026-05-06 |
| 72 | SPHINX-Tiny | 26.40 | — | Imported | 2026-05-06 |
| 73 | LLaVA (LLaMA-2-13B) | 26.10 | — | Imported | 2026-05-06 |
| 74 | InstructBLIP (Vicuna-7B) | 25.30 | — | Imported | 2026-05-06 |
| 75 | LLaVAR | 25.20 | — | Imported | 2026-05-06 |
| 76 | LLaMA-Adapter-V2 (7B) | 23.90 | — | Imported | 2026-05-06 |
| 77 | miniGPT4 (LLaMA-2-7B) | 23.10 | — | Imported | 2026-05-06 |
| 78 | mPLUG-Owl (LLaMA-7B) | 22.20 | — | Imported | 2026-05-06 |
| 79 | IDEFICS (9B-Instruct) | 19.80 | — | Imported | 2026-05-06 |
| 80 | Random Chance | 17.90 | — | Imported | 2026-05-06 |
No matching rows.