Physical AI Bench Understanding

Physical AI Bench understanding leaderboard for embodied physical reasoning over common sense, space, time, physics, robotics, autonomous driving, and video QA datasets.

25rows
overallprimary metric
2026-05-06sampled

Metadata

Metrics

Overall, Common Sense, Embodied Reasoning, Space, Time, Physics, BridgeData V2, RoboVQA, RoboFail, Agibot, HoloAssist, AV

Latest Results

Rank Subject Overall Model Match Provenance Sampled
1 Cosmos-Reason2-32B 70.80 Imported 2026-05-06
2 GPT-5 69.80 GPT-5
openai-gpt-5
Imported 2026-05-06
3 Cosmos-Reason2-8B 65.70 Imported 2026-05-06
4 Qwen3-VL-235B-A22B-Instruct 64.70 Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-06
5 Qwen3-VL-235B-A22B-Thinking 63.70 Qwen3 VL 235B A22B Thinking
qwen-qwen3-vl-235b-a22b-thinking
Imported 2026-05-06
6 Qwen3-VL-32B-Instruct 62 Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-06
7 Qwen3-VL-32B-Thinking 61 Imported 2026-05-06
8 Qwen2.5-VL-72B-Instruct 60.80 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-06
9 Qwen3-VL-30B-A3B-Instruct 59.50 Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-06
10 GLM-4.5V 59.20 GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-06
11 Qwen3-VL-30B-A3B-Thinking 57.30 Qwen3 VL 30B A3B Thinking
qwen-qwen3-vl-30b-a3b-thinking
Imported 2026-05-06
12 Qwen3-VL-8B-Thinking 57.30 Qwen3 VL 8B Thinking
qwen-qwen3-vl-8b-thinking
Imported 2026-05-06
13 Qwen3-VL-8B-Instruct 56.80 Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-06
14 Cosmos-Reason2-2B 56.40 Imported 2026-05-06
15 InternVL3.5-241B-A28B 56.30 Imported 2026-05-06
16 GPT-4o 56.20 GPT-4o
openai-gpt-4o
Imported 2026-05-06
17 InternVL3.5-38B 55.80 Imported 2026-05-06
18 Cosmos-Reason1-7B 55.70 Imported 2026-05-06
19 Qwen2.5-VL-32B-Instruct 55.30 Imported 2026-05-06
20 Qwen2.5-VL-7B-Instruct 51 Imported 2026-05-06
21 InternVL3.5-8B 49.70 Imported 2026-05-06
22 InternVL3.5-30B-A3B 49.40 Imported 2026-05-06
23 InternVL3.5-14B 48.80 Imported 2026-05-06
24 Qwen3-VL-2B-Instruct 48.70 Imported 2026-05-06
25 Claude-3.5-Sonnet 46 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06