OmniACT

OmniACT: Measures browser, desktop, mobile, or GUI agents operating in interactive environments.

16rows
action_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Action Score, Sequence Score, Click penalty (lower is better), Key penalty (lower is better), Write penalty (lower is better)

Latest Results

Rows are transcribed from the public OmniACT ECCV 2024 paper Table 4. Primary score is Action Score.

Rank Subject Action Score Model Match Provenance Sampled
1 Human Performance 80.14 Imported 2026-05-27
2 GPT-4V 17.02 GPT-4
openai-gpt-4
Imported 2026-05-27
3 GPT-4 11.6 GPT-4
openai-gpt-4
Imported 2026-05-27
4 Gemini-Pro 11.46 Imported 2026-05-27
5 LLaVA-v1.5-13B 8.19 Imported 2026-05-27
6 GPT-3.5-turbo-0613 7.89 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
7 LLaVA-v1.5-7B 5.82 Imported 2026-05-27
8 CodeLLaMA-34B 3.72 Imported 2026-05-27
9 Palmyra-X 43B 2.94 Imported 2026-05-27
10 Vicuna-13B FT 2.72 Imported 2026-05-27
11 LLaMA-13B FT 2.14 Imported 2026-05-27
12 Vicuna-13B 1.78 Imported 2026-05-27
13 LLaMA-13B 1.62 Imported 2026-05-27
14 Palmyra-Instruct-30B 1.31 Imported 2026-05-27
15 Vicuna-7B 0.77 Imported 2026-05-27
16 LLaMA-7B 0.48 Imported 2026-05-27