From Perception to Action
Interactive 3D vision-reasoning benchmark where models plan physical actions in puzzle and stacking environments.
16rows
pass_at_1primary metric
2026-05-28sampled
Metadata
Metrics
pass@1, Successful Tasks, Puzzle Success Rate, Stacking Success Rate, Average Steps (lower is better), Distance to Optimal (lower is better), Normalized Distance (lower is better), Solved/Tokens (Reported), Solved/USD
| Rank | Subject | pass@1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.2 | 22.9% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 2 | Gemini-3-Pro | 19.3% | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 3 | Claude-Opus-4.5 | 15.6% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 4 | Claude-Sonnet-4.5 | 13.8% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 5 | Kimi-k2.5 | 13.8% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 6 | Gemini-3-Flash | 11.9% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 7 | GPT-5-mini | 11% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 8 | OpenAI-o3 | 10.1% | o3 openai-o3 | Imported | 2026-05-28 |
| 9 | Qwen3-VL-30B-A3B-Thk. | 10.1% | — | Imported | 2026-05-28 |
| 10 | Seed-1.6 | 10.1% | Seed 1.6 bytedance-seed-seed-1.6 | Imported | 2026-05-28 |
| 11 | Qwen3-VL-235B-A22B-Thk. | 9.2% | — | Imported | 2026-05-28 |
| 12 | Qwen3-VL-235B-A22B-Inst | 8.3% | — | Imported | 2026-05-28 |
| 13 | Qwen3-VL-8B-Thinking | 8.3% | Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking | Imported | 2026-05-28 |
| 14 | GLM-4.6V | 7.3% | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-28 |
| 15 | Seed-1.6-Flash | 7.3% | Seed 1.6 Flash bytedance-seed-seed-1.6-flash | Imported | 2026-05-28 |
| 16 | Qwen3-VL-30B-A3B-Inst | 3.7% | — | Imported | 2026-05-28 |
No matching rows.