Math-VR | BenchmarkList

Metadata

ID: math_vr
Category: Multimodal
Release: Unknown
Source: Source page
Snapshot: Snapshot source

Metrics

Overall Answer Correctness, Overall Process Score, Text Answer Correctness, Text Process Score, Multimodal Answer Correctness, Multimodal Process Score

Rank	Subject	Overall Answer Correctness	Model Match	Provenance	Sampled
1	Qwen3-VL-235B-A22B-Thinking	66.8	Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking	Imported	2026-05-27
2	Qwen3-VL-235B-A22B-Instruct	65.0	Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct	Imported	2026-05-27
3	Gemini-2.5-Pro	64.7	Gemini 2.5 Pro google-gemini-2.5-pro	Imported	2026-05-27
4	Gemini-2.5-Flash	60.5	Gemini 2.5 Flash google-gemini-2.5-flash	Imported	2026-05-27
5	GPT-o3	59.3	—	Imported	2026-05-27
6	Seed-1.6-Thinking	58.4	—	Imported	2026-05-27
7	GPT-5-Thinking	58.1	GPT-5 openai-gpt-5	Imported	2026-05-27
8	Claude Opus4.1	54.3	Claude Opus 4.1 anthropic-claude-opus-4.1	Imported	2026-05-27
9	Nano Banana	53.4	Nano Banana (Gemini 2.5 Flash Image) google-gemini-2.5-flash-image	Imported	2026-05-27
10	Gemini-2.5-Flash-No-Thinking	52.3	—	Imported	2026-05-27
11	GLM-4.5V	49.6	GLM GLM 4.5V z-ai-glm-4.5v	Imported	2026-05-27
12	Deepseek-R1	49.5	R1 deepseek-r1	Imported	2026-05-27
13	Mimo-VL-7B-RL	48.3	—	Imported	2026-05-27
14	InternVL-3.5-8B	40.8	—	Imported	2026-05-27
15	GPT-4.1-mini	33.3	GPT-4.1 Mini openai-gpt-4.1-mini	Imported	2026-05-27
16	GLM-4.1V-9B	29.0	—	Imported	2026-05-27
17	Claude-Sonnet-4	28.1	Claude Sonnet 4 anthropic-claude-sonnet-4	Imported	2026-05-27
18	GPT-4.1	26.0	GPT-4.1 openai-gpt-4.1	Imported	2026-05-27
19	CodePlot-CoT	22.1	—	Imported	2026-05-27
20	Gemini-2.0-Flash	20.6	Gemini 2.0 Flash google-gemini-2.0-flash	Imported	2026-05-27
21	Keye-VL-1.5	17.3	—	Imported	2026-05-27
22	Gemma3	16.1	—	Imported	2026-05-27
23	Qwen-2.5-VL-72B	13.7	Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct	Imported	2026-05-27
24	Bagel-Zebra-CoT	10.1	—	Imported	2026-05-27
25	Qwen-2.5-VL-32B	10.0	—	Imported	2026-05-27
26	GPT-4.1-nano	9.1	GPT-4.1 Nano openai-gpt-4.1-nano	Imported	2026-05-27
27	InternVL-3.5-8B-No-Thinking	7.9	—	Imported	2026-05-27
28	Bagel	7.6	—	Imported	2026-05-27
29	Qwen-2.5-VL-3B	5.3	—	Imported	2026-05-27
30	GPT-4o	4.3	GPT-4o openai-gpt-4o	Imported	2026-05-27
31	Qwen-2.5-VL-7B	3.0	—	Imported	2026-05-27

Metadata

Metrics

Latest Results