ALFRED
Embodied instruction-following benchmark in AI2-THOR for mapping natural language goals and instructions to household actions.
88rows
unseen_success_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Unseen success rate, Seen success rate, Seen PLWSR, Unseen PLWSR, Seen goal-condition success rate, Unseen goal-condition success rate, Seen PLW goal-condition success rate, Unseen PLW goal-condition success rate
| Rank | Subject | Unseen success rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Performance | 91% | — | Imported | 2026-05-27 |
| 2 | GRL | 68.52% | — | Imported | 2026-05-27 |
| 3 | EPO | 62.35% | — | Imported | 2026-05-27 |
| 4 | RoboGPT | 62.0% | — | Imported | 2026-05-27 |
| 5 | SCOUT | 60.79% | — | Imported | 2026-05-27 |
| 6 | RLEF | 60.69% | — | Imported | 2026-05-27 |
| 7 | CSL-HDP(version1) | 59.65% | — | Imported | 2026-05-27 |
| 8 | LOAT | 58.14% | — | Imported | 2026-05-27 |
| 9 | ThinkBot-unseen | 57.82% | — | Imported | 2026-05-27 |
| 10 | HDP | 57.55% | — | Imported | 2026-05-27 |
| 11 | DISCO | 56.55% | — | Imported | 2026-05-27 |
| 12 | RLEF | 56.34% | — | Imported | 2026-05-27 |
| 13 | DISCO | 54.77% | — | Imported | 2026-05-27 |
| 14 | REIF | 50.83% | — | Imported | 2026-05-27 |
| 15 | SDPA | 50.75% | — | Imported | 2026-05-27 |
| 16 | [EAI23] ECLAIR | 50.36% | — | Imported | 2026-05-27 |
| 17 | ESP - 2 | 48.59% | — | Imported | 2026-05-27 |
| 18 | ESP | 48.53% | — | Imported | 2026-05-27 |
| 19 | Container | 47.5% | — | Imported | 2026-05-27 |
| 20 | DRL | 47.22% | — | Imported | 2026-05-27 |
| 21 | Prompter | 45.72% | — | Imported | 2026-05-27 |
| 22 | HD-Agent | 45.52% | — | Imported | 2026-05-27 |
| 23 | Prompter, no slice replay | 45.32% | — | Imported | 2026-05-27 |
| 24 | high level only | 43.69% | — | Imported | 2026-05-27 |
| 25 | FLARE | 40.88% | — | Imported | 2026-05-27 |
| 26 | HD-Agent | 39.18% | — | Imported | 2026-05-27 |
| 27 | HIA-High-Goal-Only | 38.52% | — | Imported | 2026-05-27 |
| 28 | EI2 | 38.19% | — | Imported | 2026-05-27 |
| 29 | [EAI22] EPA | 36.07% | — | Imported | 2026-05-27 |
| 30 | Obstacle_film | 35.83% | — | Imported | 2026-05-27 |
| 31 | zero-shot-LLM | 35.64% | — | Imported | 2026-05-27 |
| 32 | [EAI22] LGS-RPA | 35.41% | — | Imported | 2026-05-27 |
| 33 | LGS-RPA | 34.07% | — | Imported | 2026-05-27 |
| 34 | alfred-bot | 32.24% | — | Imported | 2026-05-27 |
| 35 | 726 | 29.17% | — | Imported | 2026-05-27 |
| 36 | [EAI22] Sudoer-SRCB-RMU | 28.3% | — | Imported | 2026-05-27 |
| 37 | FILM - a new semantic policy instance | 27.8% | — | Imported | 2026-05-27 |
| 38 | FILM | 26.49% | — | Imported | 2026-05-27 |
| 39 | ABP | 26.16% | — | Imported | 2026-05-27 |
| 40 | LEBP | 24.26% | — | Imported | 2026-05-27 |
| 41 | AMSLAM | 23.48% | — | Imported | 2026-05-27 |
| 42 | FIQA | 22.18% | — | Imported | 2026-05-27 |
| 43 | HLSM-MAT | 21.84% | — | Imported | 2026-05-27 |
| 44 | HLSM | 20.27% | — | Imported | 2026-05-27 |
| 45 | ECL | 17.92% | — | Imported | 2026-05-27 |
| 46 | FILM_smart | 17.72% | — | Imported | 2026-05-27 |
| 47 | CLET | 17.24% | — | Imported | 2026-05-27 |
| 48 | VLNBERT-L + M-Track | 16.29% | — | Imported | 2026-05-27 |
| 49 | [EAI21] - HLSM | 16.29% | — | Imported | 2026-05-27 |
| 50 | [EAI21] ABP | 15.43% | — | Imported | 2026-05-27 |
| 51 | [EAI21] HiTUT | 13.87% | — | Imported | 2026-05-27 |
| 52 | LSTM-L + M-Track | 13.28% | — | Imported | 2026-05-27 |
| 53 | [EAI21] LWIT | 9.42% | — | Imported | 2026-05-27 |
| 54 | Episodic Transformer (E.T.) | 8.57% | — | Imported | 2026-05-27 |
| 55 | EmBERT [36_18_18_18-horizon0] + nav_receptacle | 7.52% | — | Imported | 2026-05-27 |
| 56 | ORL | 6.56% | — | Imported | 2026-05-27 |
| 57 | LAV | 6.38% | — | Imported | 2026-05-27 |
| 58 | EmBERT [36_18_18_18-horizon0] | 6.06% | — | Imported | 2026-05-27 |
| 59 | [EAI 21] SRCB-sudoer | 5.62% | — | Imported | 2026-05-27 |
| 60 | holiday | 5.36% | — | Imported | 2026-05-27 |
| 61 | MOCA | 5.3% | — | Imported | 2026-05-27 |
| 62 | SRCB-sudoer | 5.3% | — | Imported | 2026-05-27 |
| 63 | [EAI 2021] EmBERT | 5.05% | — | Imported | 2026-05-27 |
| 64 | ECCV 2020 Winner | 4.45% | — | Imported | 2026-05-27 |
| 65 | EmBERT | 3.14% | — | Imported | 2026-05-27 |
| 66 | Baseline Seq2Seq + Progress Monitoring + Two stage mask prediction | 1.5% | — | Imported | 2026-05-27 |
| 67 | ALFRED Speaks - Augmented Agent | 0.85% | — | Imported | 2026-05-27 |
| 68 | Baseline + ImprovedMask | 0.66% | — | Imported | 2026-05-27 |
| 69 | baseline | 0.59% | — | Imported | 2026-05-27 |
| 70 | ALFS2S | 0.53% | — | Imported | 2026-05-27 |
| 71 | Baseline Seq2Seq+PM (both) | 0.39% | — | Imported | 2026-05-27 |
| 72 | baseline | 0.39% | — | Imported | 2026-05-27 |
| 73 | baseline v2 | 0.33% | — | Imported | 2026-05-27 |
| 74 | Baseline_v2 | 0.26% | — | Imported | 2026-05-27 |
| 75 | mytest_raw | 0.26% | — | Imported | 2026-05-27 |
| 76 | Seq2Seq | 0.21% | — | Imported | 2026-05-27 |
| 77 | Baseline | 0.2% | — | Imported | 2026-05-27 |
| 78 | Baseline_Test_Unseen | 0.2% | — | Imported | 2026-05-27 |
| 79 | DDDDDDD | 0.2% | — | Imported | 2026-05-27 |
| 80 | baseline | 0.2% | — | Imported | 2026-05-27 |
| 81 | baseline_bronze | 0.2% | — | Imported | 2026-05-27 |
| 82 | baseline_ggg | 0.2% | — | Imported | 2026-05-27 |
| 83 | submission_GRN | 0.2% | — | Imported | 2026-05-27 |
| 84 | submission_GRN2 | 0.2% | — | Imported | 2026-05-27 |
| 85 | test | 0.2% | — | Imported | 2026-05-27 |
| 86 | test-SLAM | 0.2% | — | Imported | 2026-05-27 |
| 87 | test_baseline_smr | 0.2% | — | Imported | 2026-05-27 |
| 88 | ThinkBot_seen | 0.0% | — | Imported | 2026-05-27 |
No matching rows.