Math-VR
Mathematical visual reasoning benchmark for VLMs, unified models, and LLMs, reporting answer correctness and process scores on text and multimodal questions.
31rows
overall_answer_correctnessprimary metric
2026-05-27sampled
Metadata
Metrics
Overall Answer Correctness, Overall Process Score, Text Answer Correctness, Text Process Score, Multimodal Answer Correctness, Multimodal Process Score
| Rank | Subject | Overall Answer Correctness | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen3-VL-235B-A22B-Thinking | 66.8 | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Imported | 2026-05-27 |
| 2 | Qwen3-VL-235B-A22B-Instruct | 65.0 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Imported | 2026-05-27 |
| 3 | Gemini-2.5-Pro | 64.7 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-27 |
| 4 | Gemini-2.5-Flash | 60.5 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-27 |
| 5 | GPT-o3 | 59.3 | — | Imported | 2026-05-27 |
| 6 | Seed-1.6-Thinking | 58.4 | — | Imported | 2026-05-27 |
| 7 | GPT-5-Thinking | 58.1 | GPT-5 openai-gpt-5 | Imported | 2026-05-27 |
| 8 | Claude Opus4.1 | 54.3 | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-27 |
| 9 | Nano Banana | 53.4 | Nano Banana (Gemini 2.5 Flash Image) google-gemini-2.5-flash-image | Imported | 2026-05-27 |
| 10 | Gemini-2.5-Flash-No-Thinking | 52.3 | — | Imported | 2026-05-27 |
| 11 | GLM-4.5V | 49.6 | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-27 |
| 12 | Deepseek-R1 | 49.5 | R1 deepseek-r1 | Imported | 2026-05-27 |
| 13 | Mimo-VL-7B-RL | 48.3 | — | Imported | 2026-05-27 |
| 14 | InternVL-3.5-8B | 40.8 | — | Imported | 2026-05-27 |
| 15 | GPT-4.1-mini | 33.3 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-27 |
| 16 | GLM-4.1V-9B | 29.0 | — | Imported | 2026-05-27 |
| 17 | Claude-Sonnet-4 | 28.1 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-27 |
| 18 | GPT-4.1 | 26.0 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 19 | CodePlot-CoT | 22.1 | — | Imported | 2026-05-27 |
| 20 | Gemini-2.0-Flash | 20.6 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-27 |
| 21 | Keye-VL-1.5 | 17.3 | — | Imported | 2026-05-27 |
| 22 | Gemma3 | 16.1 | — | Imported | 2026-05-27 |
| 23 | Qwen-2.5-VL-72B | 13.7 | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-27 |
| 24 | Bagel-Zebra-CoT | 10.1 | — | Imported | 2026-05-27 |
| 25 | Qwen-2.5-VL-32B | 10.0 | — | Imported | 2026-05-27 |
| 26 | GPT-4.1-nano | 9.1 | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-27 |
| 27 | InternVL-3.5-8B-No-Thinking | 7.9 | — | Imported | 2026-05-27 |
| 28 | Bagel | 7.6 | — | Imported | 2026-05-27 |
| 29 | Qwen-2.5-VL-3B | 5.3 | — | Imported | 2026-05-27 |
| 30 | GPT-4o | 4.3 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 31 | Qwen-2.5-VL-7B | 3.0 | — | Imported | 2026-05-27 |
No matching rows.