ALFWorld

ALFWorld: Measures embodied-agent, navigation, manipulation, or simulated robotics task success.

10rows
alfworld_normalized_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

ALFWorld normalized score, TALES overall, TextWorld, TextWorld Express, ScienceWorld, Jericho

Latest Results

Rows are parsed from the public PEARLS-Lab/TALES-Trajectories Hugging Face README leaderboard. The primary score is the ALFWorld column, defined by the source as average best normalized score per game; other TALES framework columns are retained as context metrics.

Rank Subject ALFWorld normalized score Model Match Provenance Sampled
1 claude-opus-4.5 (high) 1.0 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-27
2 claude-opus-4.6 (high) 1.0 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-27
3 claude-sonnet-4.6 (high) 1.0 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-27
4 gpt-5 (high) 0.933 GPT-5
openai-gpt-5
Imported 2026-05-27
5 claude-4-sonnet 0.917 Imported 2026-05-27
6 gpt-5.1 (high) 0.917 GPT-5.1
openai-gpt-5.1
Imported 2026-05-27
7 o3 (medium) 0.883 o3
openai-o3
Imported 2026-05-27
8 claude-3.7-sonnet (1024) 0.833 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
9 o3 (high) 0.817 o3
openai-o3
Imported 2026-05-27
10 o3 (low) 0.7 o3
openai-o3
Imported 2026-05-27