RoboBench | BenchmarkList

Metadata

ID: robobench
Category: Embodied
Release: Unknown
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Overall Dimension Average, Perception Reasoning Avg, Instruction Comprehension Avg, Generalized Planning Avg, Affordance Prediction Avg, Failure Analysis Avg

Rank	Subject	Overall Dimension Average	Model Match	Provenance	Sampled
1	Human Evaluation	67.19	—	Imported	2026-05-27
2	Gemini-2.5-Pro	50.10	Gemini 2.5 Pro google-gemini-2.5-pro	Imported	2026-05-27
3	Gemini-2.5-Flash	45.06	Gemini 2.5 Flash google-gemini-2.5-flash	Imported	2026-05-27
4	Gemini-2.0-Flash	45.04	Gemini 2.0 Flash google-gemini-2.0-flash	Imported	2026-05-27
5	Qwen-VL-Max	42.43	Qwen VL Max qwen-qwen-vl-max	Imported	2026-05-27
6	Claude-3.7-Sonnet	40.53	Claude 3.7 Sonnet anthropic-claude-3.7-sonnet	Imported	2026-05-27
7	Qwen2.5-VL-72B-Ins	40.51	—	Imported	2026-05-27
8	GPT-4o	40.16	GPT-4o openai-gpt-4o	Imported	2026-05-27
9	Claude-3.5-Sonnet	37.82	Claude 3.5 Sonnet anthropic-claude-3.5-sonnet	Imported	2026-05-27
10	RoboBrain-2.0-7B	36.59	—	Imported	2026-05-27
11	GPT-4o-Mini	34.40	GPT-4o-mini openai-gpt-4o-mini	Imported	2026-05-27
12	Qwen-VL-Plus	31.64	Qwen VL Plus qwen-qwen-vl-plus	Imported	2026-05-27
13	GPT-4o-text-only	30.23	GPT-4o openai-gpt-4o	Imported	2026-05-27
14	Qwen2.5-VL-7B-Ins	25.57	—	Imported	2026-05-27
15	LLaVA-OneVision-7B	24.91	—	Imported	2026-05-27
16	LLaVA-OneVision-0.5B	16.96	—	Imported	2026-05-27