MobileWorld

MobileWorld: Measures browser, desktop, mobile, or GUI agents operating in interactive environments.

12rows
success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Success Rate, GUI-Only Success Rate, User-Interaction Success Rate, MCP Success Rate, Average Completion Steps (lower is better), Average User Queries (lower is better), User Interaction Quality, Average MCP Calls (lower is better)

Latest Results

Rows are transcribed from the public MobileWorld arXiv paper Tables 6 and 7. Primary score is overall success rate.

Rank Subject Success Rate Model Match Provenance Sampled
1 GPT-5 + UI-Ins-7B 51.7% GPT-5
openai-gpt-5
Imported 2026-05-27
2 Gemini-3-Pro + UI-Ins-7B 46.3% Imported 2026-05-27
3 Claude-4.5-Sonnet + UI-Ins-7B 43.8% Imported 2026-05-27
4 Doubao-1.5-UI-TARS 20.9% Imported 2026-05-27
5 GELab-Zero-4B 10.9% Imported 2026-05-27
6 UI-Venus-72B 10.4% Imported 2026-05-27
7 Qwen3-VL-235B-A22B 9.5% Imported 2026-05-27
8 Qwen3-VL-32B 9% Imported 2026-05-27
9 GUI-Owl-32B 5.5% Imported 2026-05-27
10 Qwen3-VL-8B 5.5% Imported 2026-05-27
11 UI-Venus-7B 5.5% Imported 2026-05-27
12 GUI-Owl-7B 4.5% Imported 2026-05-27