ALFRED

Embodied instruction-following benchmark in AI2-THOR for mapping natural language goals and instructions to household actions.

88rows
unseen_success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Unseen success rate, Seen success rate, Seen PLWSR, Unseen PLWSR, Seen goal-condition success rate, Unseen goal-condition success rate, Seen PLW goal-condition success rate, Unseen PLW goal-condition success rate

Latest Results

Rows are parsed from the official ALFRED leaderboard's embedded CSV. Score is unseen success rate.

Rank Subject Unseen success rate Model Match Provenance Sampled
1 Human Performance 91% Imported 2026-05-27
2 GRL 68.52% Imported 2026-05-27
3 EPO 62.35% Imported 2026-05-27
4 RoboGPT 62.0% Imported 2026-05-27
5 SCOUT 60.79% Imported 2026-05-27
6 RLEF 60.69% Imported 2026-05-27
7 CSL-HDP(version1) 59.65% Imported 2026-05-27
8 LOAT 58.14% Imported 2026-05-27
9 ThinkBot-unseen 57.82% Imported 2026-05-27
10 HDP 57.55% Imported 2026-05-27
11 DISCO 56.55% Imported 2026-05-27
12 RLEF 56.34% Imported 2026-05-27
13 DISCO 54.77% Imported 2026-05-27
14 REIF 50.83% Imported 2026-05-27
15 SDPA 50.75% Imported 2026-05-27
16 [EAI23] ECLAIR 50.36% Imported 2026-05-27
17 ESP - 2 48.59% Imported 2026-05-27
18 ESP 48.53% Imported 2026-05-27
19 Container 47.5% Imported 2026-05-27
20 DRL 47.22% Imported 2026-05-27
21 Prompter 45.72% Imported 2026-05-27
22 HD-Agent 45.52% Imported 2026-05-27
23 Prompter, no slice replay 45.32% Imported 2026-05-27
24 high level only 43.69% Imported 2026-05-27
25 FLARE 40.88% Imported 2026-05-27
26 HD-Agent 39.18% Imported 2026-05-27
27 HIA-High-Goal-Only 38.52% Imported 2026-05-27
28 EI2 38.19% Imported 2026-05-27
29 [EAI22] EPA 36.07% Imported 2026-05-27
30 Obstacle_film 35.83% Imported 2026-05-27
31 zero-shot-LLM 35.64% Imported 2026-05-27
32 [EAI22] LGS-RPA 35.41% Imported 2026-05-27
33 LGS-RPA 34.07% Imported 2026-05-27
34 alfred-bot 32.24% Imported 2026-05-27
35 726 29.17% Imported 2026-05-27
36 [EAI22] Sudoer-SRCB-RMU 28.3% Imported 2026-05-27
37 FILM - a new semantic policy instance 27.8% Imported 2026-05-27
38 FILM 26.49% Imported 2026-05-27
39 ABP 26.16% Imported 2026-05-27
40 LEBP 24.26% Imported 2026-05-27
41 AMSLAM 23.48% Imported 2026-05-27
42 FIQA 22.18% Imported 2026-05-27
43 HLSM-MAT 21.84% Imported 2026-05-27
44 HLSM 20.27% Imported 2026-05-27
45 ECL 17.92% Imported 2026-05-27
46 FILM_smart 17.72% Imported 2026-05-27
47 CLET 17.24% Imported 2026-05-27
48 VLNBERT-L + M-Track 16.29% Imported 2026-05-27
49 [EAI21] - HLSM 16.29% Imported 2026-05-27
50 [EAI21] ABP 15.43% Imported 2026-05-27
51 [EAI21] HiTUT 13.87% Imported 2026-05-27
52 LSTM-L + M-Track 13.28% Imported 2026-05-27
53 [EAI21] LWIT 9.42% Imported 2026-05-27
54 Episodic Transformer (E.T.) 8.57% Imported 2026-05-27
55 EmBERT [36_18_18_18-horizon0] + nav_receptacle 7.52% Imported 2026-05-27
56 ORL 6.56% Imported 2026-05-27
57 LAV 6.38% Imported 2026-05-27
58 EmBERT [36_18_18_18-horizon0] 6.06% Imported 2026-05-27
59 [EAI 21] SRCB-sudoer 5.62% Imported 2026-05-27
60 holiday 5.36% Imported 2026-05-27
61 MOCA 5.3% Imported 2026-05-27
62 SRCB-sudoer 5.3% Imported 2026-05-27
63 [EAI 2021] EmBERT 5.05% Imported 2026-05-27
64 ECCV 2020 Winner 4.45% Imported 2026-05-27
65 EmBERT 3.14% Imported 2026-05-27
66 Baseline Seq2Seq + Progress Monitoring + Two stage mask prediction 1.5% Imported 2026-05-27
67 ALFRED Speaks - Augmented Agent 0.85% Imported 2026-05-27
68 Baseline + ImprovedMask 0.66% Imported 2026-05-27
69 baseline 0.59% Imported 2026-05-27
70 ALFS2S 0.53% Imported 2026-05-27
71 Baseline Seq2Seq+PM (both) 0.39% Imported 2026-05-27
72 baseline 0.39% Imported 2026-05-27
73 baseline v2 0.33% Imported 2026-05-27
74 Baseline_v2 0.26% Imported 2026-05-27
75 mytest_raw 0.26% Imported 2026-05-27
76 Seq2Seq 0.21% Imported 2026-05-27
77 Baseline 0.2% Imported 2026-05-27
78 Baseline_Test_Unseen 0.2% Imported 2026-05-27
79 DDDDDDD 0.2% Imported 2026-05-27
80 baseline 0.2% Imported 2026-05-27
81 baseline_bronze 0.2% Imported 2026-05-27
82 baseline_ggg 0.2% Imported 2026-05-27
83 submission_GRN 0.2% Imported 2026-05-27
84 submission_GRN2 0.2% Imported 2026-05-27
85 test 0.2% Imported 2026-05-27
86 test-SLAM 0.2% Imported 2026-05-27
87 test_baseline_smr 0.2% Imported 2026-05-27
88 ThinkBot_seen 0.0% Imported 2026-05-27