OpenClaw Arena Model Leaderboard

Personal AI agent benchmark evaluating frontier models across real-world OpenClaw-style tasks.

13rows
average_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Avg Score, Tasks Completed

Latest Results

Rows are parsed from the public OpenClaw Arena static model leaderboard. Per-task heatmap scores are included as additional metrics where available.

Rank Subject Avg Score Model Match Provenance Sampled
1 claude-opus-4.5 0.67 Imported 2026-05-06
2 gpt-5.2 0.64 Imported 2026-05-06
3 gemini-2.5-pro 0.64 Imported 2026-05-06
4 gpt-5.1 0.63 Imported 2026-05-06
5 claude-sonnet-4.6 0.62 Imported 2026-05-06
6 claude-opus-4.6 0.62 Imported 2026-05-06
7 gemini-3.1-pro-preview 0.61 Imported 2026-05-06
8 gpt-5-mini 0.61 Imported 2026-05-06
9 gpt-4.1 0.60 Imported 2026-05-06
10 claude-sonnet-4 0.58 Imported 2026-05-06
11 claude-sonnet-4.5 0.57 Imported 2026-05-06
12 gpt-4o 0.52 Imported 2026-05-06
13 gpt-4o-mini 0.42 Imported 2026-05-06