OmniACT
OmniACT: Measures browser, desktop, mobile, or GUI agents operating in interactive environments.
16rows
action_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Action Score, Sequence Score, Click penalty (lower is better), Key penalty (lower is better), Write penalty (lower is better)
| Rank | Subject | Action Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Performance | 80.14 | — | Imported | 2026-05-27 |
| 2 | GPT-4V | 17.02 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | GPT-4 | 11.6 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 4 | Gemini-Pro | 11.46 | — | Imported | 2026-05-27 |
| 5 | LLaVA-v1.5-13B | 8.19 | — | Imported | 2026-05-27 |
| 6 | GPT-3.5-turbo-0613 | 7.89 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 7 | LLaVA-v1.5-7B | 5.82 | — | Imported | 2026-05-27 |
| 8 | CodeLLaMA-34B | 3.72 | — | Imported | 2026-05-27 |
| 9 | Palmyra-X 43B | 2.94 | — | Imported | 2026-05-27 |
| 10 | Vicuna-13B FT | 2.72 | — | Imported | 2026-05-27 |
| 11 | LLaMA-13B FT | 2.14 | — | Imported | 2026-05-27 |
| 12 | Vicuna-13B | 1.78 | — | Imported | 2026-05-27 |
| 13 | LLaMA-13B | 1.62 | — | Imported | 2026-05-27 |
| 14 | Palmyra-Instruct-30B | 1.31 | — | Imported | 2026-05-27 |
| 15 | Vicuna-7B | 0.77 | — | Imported | 2026-05-27 |
| 16 | LLaMA-7B | 0.48 | — | Imported | 2026-05-27 |
No matching rows.