AndroidWorld

AndroidWorld: Measures browser, desktop, mobile, or GUI agents operating in interactive environments.

43rows
pass_at_1_success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Success Rate (pass@1), Success Rate (pass@k), Number of trials

Latest Results

Rows are parsed from the public AndroidWorld leaderboard Google Sheet. The source warns results are community-submitted, self-reported, and not independently verified.

Rank Subject Success Rate (pass@1) Model Match Provenance Sampled
1 AGI-0 97.4% Imported 2026-05-27
2 Gemini 3 flash, Gemini 3 flash lite 97.4% Imported 2026-05-27
3 Seed1.8-GUI 97.4% Imported 2026-05-27
4 askui AndroidVisionAgent, Claude 4.5 Sonnet + Claude 4.0 Sonnet 94.8% Imported 2026-05-27
5 gemini 3 pro + sonnet 4.5 94.8% Imported 2026-05-27
6 GPT5, Gemini 2.5 Pro 91.4% Imported 2026-05-27
7 Llama 4-scout, Gemini 2.5 pro, GPT-5 nano 91.4% Imported 2026-05-27
8 - 88.8% Imported 2026-05-27
9 o3 + holo1.5-72b 87.1% Imported 2026-05-27
10 Sonnet 4.5 + Sonnet 4 86.2% Imported 2026-05-27
11 AutoGLM-Mobile 80.2% Imported 2026-05-27
12 Human 80% Imported 2026-05-27
13 LX-GUIAgent 79.3% Imported 2026-05-27
14 Gemini-2.5-Pro+UI-TARS-1.5 78% Imported 2026-05-27
15 MAI-UI-235B-A22B 76.7% Imported 2026-05-27
16 Qwen2.5-VL-72B + Qwen2.5-VL-7B 76.7% Imported 2026-05-27
17 Hammer-UI-32B 75% Imported 2026-05-27
18 GUI-Owl-32B 73.3% Imported 2026-05-27
19 MAI-UI-32B 73.3% Imported 2026-05-27
20 MAI-UI-8B 70.7% Imported 2026-05-27
21 Gemini 2.5 Computer Use 69.7% Imported 2026-05-27
22 JT-GUIAgent-V2 67.2% Imported 2026-05-27
23 GUI-Owl-7B 66.4% Imported 2026-05-27
24 UI-Venus-Navi-72B 65.9% Imported 2026-05-27
25 Qwen2.5-VL-72B 62.9% Imported 2026-05-27
26 Seed1.5-VL 62.1% Imported 2026-05-27
27 JT-GUIAgent-V1 60% Imported 2026-05-27
28 V-Droid (Llama8B) 59.5% Imported 2026-05-27
29 Agent S2 54.3% Imported 2026-05-27
30 MAI-UI-2B 49.1% Imported 2026-05-27
31 Venus-Navi-7B 49.1% Imported 2026-05-27
32 GPT-4o 47.4% Imported 2026-05-27
33 GPT-4o 46.8% Imported 2026-05-27
34 UI-TARS 46.6% Imported 2026-05-27
35 GPT-4o + Aria-UI 44.8% Imported 2026-05-27
36 GPT-4o + UGround 44% Imported 2026-05-27
37 ScaleTrack-7B 44% Imported 2026-05-27
38 GPT-4o 42.2% Imported 2026-05-27
39 GPT-4o 34.5% Imported 2026-05-27
40 GPT-4 Turbo 30.6% Imported 2026-05-27
41 GPT-4o, OS-Atlas-Pro 4B, Qwen2-VL-2B-Instruct 27.6% Imported 2026-05-27
42 Qwen-2.5-VL-7B 27.6% Imported 2026-05-27
43 Qwen2-VL-2B (fine-tuned) 9% Imported 2026-05-27