ALFWorld
ALFWorld: Measures embodied-agent, navigation, manipulation, or simulated robotics task success.
10rows
alfworld_normalized_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
ALFWorld normalized score, TALES overall, TextWorld, TextWorld Express, ScienceWorld, Jericho
| Rank | Subject | ALFWorld normalized score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude-opus-4.5 (high) | 1.0 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-27 |
| 2 | claude-opus-4.6 (high) | 1.0 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-27 |
| 3 | claude-sonnet-4.6 (high) | 1.0 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-27 |
| 4 | gpt-5 (high) | 0.933 | GPT-5 openai-gpt-5 | Imported | 2026-05-27 |
| 5 | claude-4-sonnet | 0.917 | — | Imported | 2026-05-27 |
| 6 | gpt-5.1 (high) | 0.917 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-27 |
| 7 | o3 (medium) | 0.883 | o3 openai-o3 | Imported | 2026-05-27 |
| 8 | claude-3.7-sonnet (1024) | 0.833 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 9 | o3 (high) | 0.817 | o3 openai-o3 | Imported | 2026-05-27 |
| 10 | o3 (low) | 0.7 | o3 openai-o3 | Imported | 2026-05-27 |
No matching rows.