OSWorld
Benchmark for multimodal computer-use agents performing open-ended tasks in real desktop operating-system environments.
104rows
success_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Success rate, Success rate std (lower is better), Run count, Max steps, Successful tasks, Task count, Chrome success rate, Gimp success rate, Libreoffice Calc success rate, Libreoffice Impress success rate, Libreoffice Writer success rate, Multi Apps success rate, Os success rate, Thunderbird success rate, Vlc success rate, Vs Code success rate
| Rank | Subject | Success rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Pointer Agent w/ Opus 4.7 (100 steps) | 83.64% | — | Imported | 2026-05-27 |
| 2 | Pointer Agent w/ Sonnet 4.6 (100 steps) | 81.45% | — | Imported | 2026-05-27 |
| 3 | Holo3-35B-A3B (100 steps) | 80.355% | — | Imported | 2026-05-27 |
| 4 | OpenAPA w/ gemini-3.1-pro (100 steps) | 78.34% | — | Imported | 2026-05-27 |
| 5 | VLAA-GUI w/ Opus 4.5 (100 steps) | 76.26% | — | Imported | 2026-05-27 |
| 6 | HIPPO Agent w/ Opus 4.5 (100 steps) | 74.48% | — | Imported | 2026-05-27 |
| 7 | Qwen 3.7 Plus (100 steps) | 73.3% | — | Imported | 2026-05-27 |
| 8 | Kimi K2.6 (100 steps) | 73.06% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-27 |
| 9 | agent s3 w/ Opus 4.5 + GPT-5 bBoN (N=10) (100 steps) | 72.58% | — | Imported | 2026-05-27 |
| 10 | claude-sonnet-4-6 (100 steps) | 72.11% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-27 |
| 11 | agent s3 w/ GPT-5 bBoN (N=10) (100 steps) | 69.9% | — | Imported | 2026-05-27 |
| 12 | UiPath Screen Agent w/ Opus 4.5 (100 steps) | 67.14% | — | Imported | 2026-05-27 |
| 13 | agent s3 w/ Opus 4.5 bBoN (N=1) (100 steps) | 65.998% | — | Imported | 2026-05-27 |
| 14 | OS-Symphony w/ GPT-5 (50 steps) | 65.77% | — | Imported | 2026-05-27 |
| 15 | UiPath Screen Agent w/ Opus 4.5 (50 steps) | 64.4% | — | Imported | 2026-05-27 |
| 16 | GBOX Agent (15 steps) | 64.22% | — | Imported | 2026-05-27 |
| 17 | GTA1 w/ GPT-5 (100 steps) | 63.41% | — | Imported | 2026-05-27 |
| 18 | Kimi K2.5 (100 steps) | 63.3% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-27 |
| 19 | claude-sonnet-4-5-20250929 (100 steps) | 62.88% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-27 |
| 20 | agent s3 w/ GPT-5 bBoN (N=1) (100 steps) | 62.601% | — | Imported | 2026-05-27 |
| 21 | Agentic-Lybic-Maestro (100 steps) | 61.93% | — | Imported | 2026-05-27 |
| 22 | Seed-1.8 (100 steps) | 61.87% | — | Imported | 2026-05-27 |
| 23 | CoACT-1 (150 steps) | 60.76% | — | Imported | 2026-05-27 |
| 24 | CoACT-1 (100 steps) | 59.93% | — | Imported | 2026-05-27 |
| 25 | claude-sonnet-4-5-20250929 (50 steps) | 58.08% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-27 |
| 26 | aworldGUIAgent-v1 (50 steps) | 58.04% | — | Imported | 2026-05-27 |
| 27 | Agentic-Lybic-Maestro (50 steps) | 56.9% | — | Imported | 2026-05-27 |
| 28 | EvoCUA-20260105 (50 steps) | 56.73% | — | Imported | 2026-05-27 |
| 29 | CoACT-1 (50 steps) | 56.39% | — | Imported | 2026-05-27 |
| 30 | agent s2.5 w/ o3 (100 steps) | 56.0% | — | Imported | 2026-05-27 |
| 31 | GUI-Owl-1.5 32B (50 steps) | 55.44% | — | Imported | 2026-05-27 |
| 32 | agent s2.5 w/ o3 (50 steps) | 54.2% | — | Imported | 2026-05-27 |
| 33 | DeepMiner-Mano-72B (100 steps) | 53.91% | — | Imported | 2026-05-27 |
| 34 | UiPath Screen Agent w/ GPT-5 (50 steps) | 53.63% | — | Imported | 2026-05-27 |
| 35 | GTA1 w/ o3 (100 steps) | 53.1% | — | Imported | 2026-05-27 |
| 36 | UI-TARS-2-2509 (100 steps) | 53.1% | — | Imported | 2026-05-27 |
| 37 | Jedi-7B w/ o3 (100 steps) | 51.0% | — | Imported | 2026-05-27 |
| 38 | Jedi-7B w/ o3 (50 steps) | 50.6% | — | Imported | 2026-05-27 |
| 39 | EvoCUA (50 steps) | 50.3% | — | Imported | 2026-05-27 |
| 40 | GTA1 w/ o3 (50 steps) | 48.59% | — | Imported | 2026-05-27 |
| 41 | autoglm-os-9b-20250925 (50 steps) | 48.03% | — | Imported | 2026-05-27 |
| 42 | autoglm-os-9b (50 steps) | 47.26% | — | Imported | 2026-05-27 |
| 43 | autoglm-os-9b-20250925 (15 steps) | 46.88% | — | Imported | 2026-05-27 |
| 44 | autoglm-os-9b (15 steps) | 46.26% | — | Imported | 2026-05-27 |
| 45 | EvoCUA-8B-20260105 (50 steps) | 46.06% | — | Imported | 2026-05-27 |
| 46 | agent s2 w/ gemini-2.5-pro (50 steps) | 45.76% | — | Imported | 2026-05-27 |
| 47 | opencua-72b-preview (100 steps) | 45.0% | — | Imported | 2026-05-27 |
| 48 | opencua-72b-preview (50 steps) | 44.9% | — | Imported | 2026-05-27 |
| 49 | claude-4-sonnet-20250514 (50 steps) | 43.9% | — | Imported | 2026-05-27 |
| 50 | claude-sonnet-4-5-20250929 (15 steps) | 42.88% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-27 |
| 51 | Jedi-7B w/ o3 (15 steps) | 42.4% | — | Imported | 2026-05-27 |
| 52 | UI-TARS-250705 (100 steps) | 41.84% | — | Imported | 2026-05-27 |
| 53 | qwen3-vl-flash-2025-10-25 (100 steps) | 41.57% | — | Imported | 2026-05-27 |
| 54 | claude-4-sonnet-20250514 (100 steps) | 41.4% | — | Imported | 2026-05-27 |
| 55 | DART-GUI-7B-0924 (30 steps) | 40.47% | — | Imported | 2026-05-27 |
| 56 | DeepMiner-Mano-7B (100 steps) | 40.15% | — | Imported | 2026-05-27 |
| 57 | doubao-1-5-thinking-vision-pro-250717 (100 steps) | 40.0% | — | Imported | 2026-05-27 |
| 58 | CoACT-1 (15 steps) | 39.81% | — | Imported | 2026-05-27 |
| 59 | agent s2.5 w/ o3 (15 steps) | 39.0% | — | Imported | 2026-05-27 |
| 60 | opencua-72b-preview (15 steps) | 39.0% | — | Imported | 2026-05-27 |
| 61 | mobile-agent-v3 w/ gui-owl-32b (50 steps) | 38.91% | — | Imported | 2026-05-27 |
| 62 | claude-3-7-sonnet-20250219 (50 steps) | 35.8% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 63 | claude-3-7-sonnet-20250219 (100 steps) | 35.6% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 64 | opencua-32b (100 steps) | 34.766667% | — | Imported | 2026-05-27 |
| 65 | agent s2 w/ gemini-2.5-pro (15 steps) | 34.64% | — | Imported | 2026-05-27 |
| 66 | opencua-32b (50 steps) | 34.133333% | — | Imported | 2026-05-27 |
| 67 | doubao-1-5-thinking-vision-pro-250428 (100 steps) | 33.8% | — | Imported | 2026-05-27 |
| 68 | gui-owl-7b (15 steps) | 32.11% | — | Imported | 2026-05-27 |
| 69 | doubao-1-5-thinking-vision-pro-250717 (15 steps) | 31.9% | — | Imported | 2026-05-27 |
| 70 | computer-use-preview (50 steps) | 31.3% | — | Imported | 2026-05-27 |
| 71 | claude-4-sonnet-20250514 (15 steps) | 31.2% | — | Imported | 2026-05-27 |
| 72 | computer-use-preview (100 steps) | 30.515% | — | Imported | 2026-05-27 |
| 73 | TianXi-Action-7B (50 steps) | 29.81% | — | Imported | 2026-05-27 |
| 74 | opencua-32b (15 steps) | 29.666667% | — | Imported | 2026-05-27 |
| 75 | Jedi-7B w/ gpt-4o (100 steps) | 29.3% | — | Imported | 2026-05-27 |
| 76 | opencua-7b (50 steps) | 28.166667% | — | Imported | 2026-05-27 |
| 77 | doubao-1-5-thinking-vision-pro-250428 (15 steps) | 27.8% | — | Imported | 2026-05-27 |
| 78 | uitars-1.5-7b (100 steps) | 27.4% | UI-TARS 7B bytedance-ui-tars-1.5-7b | Imported | 2026-05-27 |
| 79 | uitars-1.5-7b (50 steps) | 27.25% | UI-TARS 7B bytedance-ui-tars-1.5-7b | Imported | 2026-05-27 |
| 80 | claude-3-7-sonnet-20250219 (15 steps) | 27.1% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 81 | uitars-72b-dpo (100 steps) | 27.1% | — | Imported | 2026-05-27 |
| 82 | Jedi-7B w/ gpt-4o (50 steps) | 27.0% | — | Imported | 2026-05-27 |
| 83 | Jedi-7B w/ gpt-4o (15 steps) | 26.8% | — | Imported | 2026-05-27 |
| 84 | opencua-7b (100 steps) | 26.633333% | — | Imported | 2026-05-27 |
| 85 | computer-use-preview (15 steps) | 26.0% | — | Imported | 2026-05-27 |
| 86 | uitars-72b-dpo (50 steps) | 25.8% | — | Imported | 2026-05-27 |
| 87 | uitars-1.5-7b (15 steps) | 24.5% | UI-TARS 7B bytedance-ui-tars-1.5-7b | Imported | 2026-05-27 |
| 88 | opencua-7b (15 steps) | 24.266667% | — | Imported | 2026-05-27 |
| 89 | uitars-72b-dpo (15 steps) | 24.0% | — | Imported | 2026-05-27 |
| 90 | opencua-qwen2-7b (100 steps) | 23.1% | — | Imported | 2026-05-27 |
| 91 | o3 (100 steps) | 23.0% | o3 openai-o3 | Imported | 2026-05-27 |
| 92 | opencua-qwen2-7b (50 steps) | 20.6% | — | Imported | 2026-05-27 |
| 93 | opencua-a3b (50 steps) | 19.9% | — | Imported | 2026-05-27 |
| 94 | opencua-qwen2-7b (15 steps) | 19.9% | — | Imported | 2026-05-27 |
| 95 | opencua-a3b (100 steps) | 17.7% | — | Imported | 2026-05-27 |
| 96 | o3 (50 steps) | 17.17% | o3 openai-o3 | Imported | 2026-05-27 |
| 97 | opencua-a3b (15 steps) | 16.9% | — | Imported | 2026-05-27 |
| 98 | kimi-vl-a3b (100 steps) | 10.3% | — | Imported | 2026-05-27 |
| 99 | kimi-vl-a3b (15 steps) | 9.7% | — | Imported | 2026-05-27 |
| 100 | o3 (15 steps) | 9.1% | o3 openai-o3 | Imported | 2026-05-27 |
| 101 | qwen2.5-vl-72b-instruct (100 steps) | 5.0% | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-27 |
| 102 | qwen2.5-vl-72b-instruct (15 steps) | 4.43% | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-27 |
| 103 | qwen2.5-vl-32b-instruct (100 steps) | 3.88% | — | Imported | 2026-05-27 |
| 104 | qwen2.5-vl-32b-instruct (15 steps) | 3.04% | — | Imported | 2026-05-27 |
No matching rows.