OSWorld

Benchmark for multimodal computer-use agents performing open-ended tasks in real desktop operating-system environments.

104rows
success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Success rate, Success rate std (lower is better), Run count, Max steps, Successful tasks, Task count, Chrome success rate, Gimp success rate, Libreoffice Calc success rate, Libreoffice Impress success rate, Libreoffice Writer success rate, Multi Apps success rate, Os success rate, Thunderbird success rate, Vlc success rate, Vs Code success rate

Latest Results

Rows are parsed from OSWorld's official osworld_verified_results.xlsx loaded by the project page. Multiple runs with the same model and max-step setting are averaged.

Rank Subject Success rate Model Match Provenance Sampled
1 Pointer Agent w/ Opus 4.7 (100 steps) 83.64% Imported 2026-05-27
2 Pointer Agent w/ Sonnet 4.6 (100 steps) 81.45% Imported 2026-05-27
3 Holo3-35B-A3B (100 steps) 80.355% Imported 2026-05-27
4 OpenAPA w/ gemini-3.1-pro (100 steps) 78.34% Imported 2026-05-27
5 VLAA-GUI w/ Opus 4.5 (100 steps) 76.26% Imported 2026-05-27
6 HIPPO Agent w/ Opus 4.5 (100 steps) 74.48% Imported 2026-05-27
7 Qwen 3.7 Plus (100 steps) 73.3% Imported 2026-05-27
8 Kimi K2.6 (100 steps) 73.06% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-27
9 agent s3 w/ Opus 4.5 + GPT-5 bBoN (N=10) (100 steps) 72.58% Imported 2026-05-27
10 claude-sonnet-4-6 (100 steps) 72.11% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-27
11 agent s3 w/ GPT-5 bBoN (N=10) (100 steps) 69.9% Imported 2026-05-27
12 UiPath Screen Agent w/ Opus 4.5 (100 steps) 67.14% Imported 2026-05-27
13 agent s3 w/ Opus 4.5 bBoN (N=1) (100 steps) 65.998% Imported 2026-05-27
14 OS-Symphony w/ GPT-5 (50 steps) 65.77% Imported 2026-05-27
15 UiPath Screen Agent w/ Opus 4.5 (50 steps) 64.4% Imported 2026-05-27
16 GBOX Agent (15 steps) 64.22% Imported 2026-05-27
17 GTA1 w/ GPT-5 (100 steps) 63.41% Imported 2026-05-27
18 Kimi K2.5 (100 steps) 63.3% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-27
19 claude-sonnet-4-5-20250929 (100 steps) 62.88% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
20 agent s3 w/ GPT-5 bBoN (N=1) (100 steps) 62.601% Imported 2026-05-27
21 Agentic-Lybic-Maestro (100 steps) 61.93% Imported 2026-05-27
22 Seed-1.8 (100 steps) 61.87% Imported 2026-05-27
23 CoACT-1 (150 steps) 60.76% Imported 2026-05-27
24 CoACT-1 (100 steps) 59.93% Imported 2026-05-27
25 claude-sonnet-4-5-20250929 (50 steps) 58.08% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
26 aworldGUIAgent-v1 (50 steps) 58.04% Imported 2026-05-27
27 Agentic-Lybic-Maestro (50 steps) 56.9% Imported 2026-05-27
28 EvoCUA-20260105 (50 steps) 56.73% Imported 2026-05-27
29 CoACT-1 (50 steps) 56.39% Imported 2026-05-27
30 agent s2.5 w/ o3 (100 steps) 56.0% Imported 2026-05-27
31 GUI-Owl-1.5 32B (50 steps) 55.44% Imported 2026-05-27
32 agent s2.5 w/ o3 (50 steps) 54.2% Imported 2026-05-27
33 DeepMiner-Mano-72B (100 steps) 53.91% Imported 2026-05-27
34 UiPath Screen Agent w/ GPT-5 (50 steps) 53.63% Imported 2026-05-27
35 GTA1 w/ o3 (100 steps) 53.1% Imported 2026-05-27
36 UI-TARS-2-2509 (100 steps) 53.1% Imported 2026-05-27
37 Jedi-7B w/ o3 (100 steps) 51.0% Imported 2026-05-27
38 Jedi-7B w/ o3 (50 steps) 50.6% Imported 2026-05-27
39 EvoCUA (50 steps) 50.3% Imported 2026-05-27
40 GTA1 w/ o3 (50 steps) 48.59% Imported 2026-05-27
41 autoglm-os-9b-20250925 (50 steps) 48.03% Imported 2026-05-27
42 autoglm-os-9b (50 steps) 47.26% Imported 2026-05-27
43 autoglm-os-9b-20250925 (15 steps) 46.88% Imported 2026-05-27
44 autoglm-os-9b (15 steps) 46.26% Imported 2026-05-27
45 EvoCUA-8B-20260105 (50 steps) 46.06% Imported 2026-05-27
46 agent s2 w/ gemini-2.5-pro (50 steps) 45.76% Imported 2026-05-27
47 opencua-72b-preview (100 steps) 45.0% Imported 2026-05-27
48 opencua-72b-preview (50 steps) 44.9% Imported 2026-05-27
49 claude-4-sonnet-20250514 (50 steps) 43.9% Imported 2026-05-27
50 claude-sonnet-4-5-20250929 (15 steps) 42.88% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
51 Jedi-7B w/ o3 (15 steps) 42.4% Imported 2026-05-27
52 UI-TARS-250705 (100 steps) 41.84% Imported 2026-05-27
53 qwen3-vl-flash-2025-10-25 (100 steps) 41.57% Imported 2026-05-27
54 claude-4-sonnet-20250514 (100 steps) 41.4% Imported 2026-05-27
55 DART-GUI-7B-0924 (30 steps) 40.47% Imported 2026-05-27
56 DeepMiner-Mano-7B (100 steps) 40.15% Imported 2026-05-27
57 doubao-1-5-thinking-vision-pro-250717 (100 steps) 40.0% Imported 2026-05-27
58 CoACT-1 (15 steps) 39.81% Imported 2026-05-27
59 agent s2.5 w/ o3 (15 steps) 39.0% Imported 2026-05-27
60 opencua-72b-preview (15 steps) 39.0% Imported 2026-05-27
61 mobile-agent-v3 w/ gui-owl-32b (50 steps) 38.91% Imported 2026-05-27
62 claude-3-7-sonnet-20250219 (50 steps) 35.8% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
63 claude-3-7-sonnet-20250219 (100 steps) 35.6% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
64 opencua-32b (100 steps) 34.766667% Imported 2026-05-27
65 agent s2 w/ gemini-2.5-pro (15 steps) 34.64% Imported 2026-05-27
66 opencua-32b (50 steps) 34.133333% Imported 2026-05-27
67 doubao-1-5-thinking-vision-pro-250428 (100 steps) 33.8% Imported 2026-05-27
68 gui-owl-7b (15 steps) 32.11% Imported 2026-05-27
69 doubao-1-5-thinking-vision-pro-250717 (15 steps) 31.9% Imported 2026-05-27
70 computer-use-preview (50 steps) 31.3% Imported 2026-05-27
71 claude-4-sonnet-20250514 (15 steps) 31.2% Imported 2026-05-27
72 computer-use-preview (100 steps) 30.515% Imported 2026-05-27
73 TianXi-Action-7B (50 steps) 29.81% Imported 2026-05-27
74 opencua-32b (15 steps) 29.666667% Imported 2026-05-27
75 Jedi-7B w/ gpt-4o (100 steps) 29.3% Imported 2026-05-27
76 opencua-7b (50 steps) 28.166667% Imported 2026-05-27
77 doubao-1-5-thinking-vision-pro-250428 (15 steps) 27.8% Imported 2026-05-27
78 uitars-1.5-7b (100 steps) 27.4% UI-TARS 7B
bytedance-ui-tars-1.5-7b
Imported 2026-05-27
79 uitars-1.5-7b (50 steps) 27.25% UI-TARS 7B
bytedance-ui-tars-1.5-7b
Imported 2026-05-27
80 claude-3-7-sonnet-20250219 (15 steps) 27.1% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
81 uitars-72b-dpo (100 steps) 27.1% Imported 2026-05-27
82 Jedi-7B w/ gpt-4o (50 steps) 27.0% Imported 2026-05-27
83 Jedi-7B w/ gpt-4o (15 steps) 26.8% Imported 2026-05-27
84 opencua-7b (100 steps) 26.633333% Imported 2026-05-27
85 computer-use-preview (15 steps) 26.0% Imported 2026-05-27
86 uitars-72b-dpo (50 steps) 25.8% Imported 2026-05-27
87 uitars-1.5-7b (15 steps) 24.5% UI-TARS 7B
bytedance-ui-tars-1.5-7b
Imported 2026-05-27
88 opencua-7b (15 steps) 24.266667% Imported 2026-05-27
89 uitars-72b-dpo (15 steps) 24.0% Imported 2026-05-27
90 opencua-qwen2-7b (100 steps) 23.1% Imported 2026-05-27
91 o3 (100 steps) 23.0% o3
openai-o3
Imported 2026-05-27
92 opencua-qwen2-7b (50 steps) 20.6% Imported 2026-05-27
93 opencua-a3b (50 steps) 19.9% Imported 2026-05-27
94 opencua-qwen2-7b (15 steps) 19.9% Imported 2026-05-27
95 opencua-a3b (100 steps) 17.7% Imported 2026-05-27
96 o3 (50 steps) 17.17% o3
openai-o3
Imported 2026-05-27
97 opencua-a3b (15 steps) 16.9% Imported 2026-05-27
98 kimi-vl-a3b (100 steps) 10.3% Imported 2026-05-27
99 kimi-vl-a3b (15 steps) 9.7% Imported 2026-05-27
100 o3 (15 steps) 9.1% o3
openai-o3
Imported 2026-05-27
101 qwen2.5-vl-72b-instruct (100 steps) 5.0% Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-27
102 qwen2.5-vl-72b-instruct (15 steps) 4.43% Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-27
103 qwen2.5-vl-32b-instruct (100 steps) 3.88% Imported 2026-05-27
104 qwen2.5-vl-32b-instruct (15 steps) 3.04% Imported 2026-05-27