OpenClaw Arena Model Leaderboard
Personal AI agent benchmark evaluating frontier models across real-world OpenClaw-style tasks.
13rows
average_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Avg Score, Tasks Completed
| Rank | Subject | Avg Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude-opus-4.5 | 0.67 | — | Imported | 2026-05-06 |
| 2 | gpt-5.2 | 0.64 | — | Imported | 2026-05-06 |
| 3 | gemini-2.5-pro | 0.64 | — | Imported | 2026-05-06 |
| 4 | gpt-5.1 | 0.63 | — | Imported | 2026-05-06 |
| 5 | claude-sonnet-4.6 | 0.62 | — | Imported | 2026-05-06 |
| 6 | claude-opus-4.6 | 0.62 | — | Imported | 2026-05-06 |
| 7 | gemini-3.1-pro-preview | 0.61 | — | Imported | 2026-05-06 |
| 8 | gpt-5-mini | 0.61 | — | Imported | 2026-05-06 |
| 9 | gpt-4.1 | 0.60 | — | Imported | 2026-05-06 |
| 10 | claude-sonnet-4 | 0.58 | — | Imported | 2026-05-06 |
| 11 | claude-sonnet-4.5 | 0.57 | — | Imported | 2026-05-06 |
| 12 | gpt-4o | 0.52 | — | Imported | 2026-05-06 |
| 13 | gpt-4o-mini | 0.42 | — | Imported | 2026-05-06 |
No matching rows.