CALVIN
Language-conditioned robot manipulation benchmark for long-horizon sequences and multitask learning in tabletop environments.
46rows
lh_mtlc_avg_lenprimary metric
2026-05-27sampled
Metadata
Metrics
MTLC success rate, LH-MTLC 1 instruction, LH-MTLC 2 instructions, LH-MTLC 3 instructions, LH-MTLC 4 instructions, LH-MTLC 5 instructions, LH-MTLC average length
| Rank | Subject | LH-MTLC average length | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | FLOWER (Train A, B, C, D -> Test D) | 4.67 | — | Imported | 2026-05-27 |
| 2 | UniVLA (Train A, B, C, D -> Test D) | 4.63 | — | Imported | 2026-05-27 |
| 3 | FLOWER (Train A, B, C -> Test D) | 4.53 | — | Imported | 2026-05-27 |
| 4 | MDT (Train A, B, C, D -> Test D) | 4.52 | — | Imported | 2026-05-27 |
| 5 | UniVLA (Train A, B, C -> Test D) | 4.41 | — | Imported | 2026-05-27 |
| 6 | MoDE (Train A, B, C, D -> Test D) | 4.39 | — | Imported | 2026-05-27 |
| 7 | FLOWER (Train D -> Test D) | 4.35 | — | Imported | 2026-05-27 |
| 8 | SeeR-Large (Train A, B, C -> Test D) | 4.28 | — | Imported | 2026-05-27 |
| 9 | GR-1 (Train A, B, C, D -> Test D) | 4.21 | — | Imported | 2026-05-27 |
| 10 | DeeR (Train A, B, C, D -> Test D) | 4.13 | — | Imported | 2026-05-27 |
| 11 | RoboFlamingo (Train A, B, C, D -> Test D) | 4.08 | — | Imported | 2026-05-27 |
| 12 | GR-MG (Train A, B, C -> Test D) | 4.04 | — | Imported | 2026-05-27 |
| 13 | MoDE (Train A, B, C -> Test D) | 4.01 | — | Imported | 2026-05-27 |
| 14 | RoboUniView (Train D -> Test D) | 3.85 | — | Imported | 2026-05-27 |
| 15 | MDT (Train D -> Test D) | 3.72 | — | Imported | 2026-05-27 |
| 16 | GHIL-Glue (Train A, B, C -> Test D) | 3.69 | — | Imported | 2026-05-27 |
| 17 | RoboUniView (Train A, B, C -> Test D) | 3.64 | — | Imported | 2026-05-27 |
| 18 | Diffusion Transformer Policy (Train A, B, C -> Test D) | 3.61 | — | Imported | 2026-05-27 |
| 19 | CLOVER (Train A, B, C -> Test D) | 3.53 | — | Imported | 2026-05-27 |
| 20 | HULC++ (Train D -> Test D) | 3.3 | — | Imported | 2026-05-27 |
| 21 | 3D Diffuser Actor (Train A, B, C -> Test D) | 3.27 | — | Imported | 2026-05-27 |
| 22 | TaKSIE (Train D -> Test D) | 3.18 | — | Imported | 2026-05-27 |
| 23 | GR-1 (Train A, B, C -> Test D) | 3.06 | — | Imported | 2026-05-27 |
| 24 | HULC (Train A, B, C, D -> Test D) | 3.06 | — | Imported | 2026-05-27 |
| 25 | LCD (Train D -> Test D) | 2.88 | — | Imported | 2026-05-27 |
| 26 | DeeR (Train D -> Test D) | 2.83 | — | Imported | 2026-05-27 |
| 27 | DeeR (Train A, B, C -> Test D) | 2.82 | — | Imported | 2026-05-27 |
| 28 | SuSIE (Train A, B, C -> Test D) | 2.69 | — | Imported | 2026-05-27 |
| 29 | SPIL (Train D -> Test D) | 2.67 | — | Imported | 2026-05-27 |
| 30 | HULC (Train D -> Test D) | 2.64 | — | Imported | 2026-05-27 |
| 31 | RoboFlamingo (Train A, B, C -> Test D) | 2.47 | — | Imported | 2026-05-27 |
| 32 | Baseline + delta actions (Train D -> Test D) | 1.82 | — | Imported | 2026-05-27 |
| 33 | SPIL (Train A, B, C -> Test D) | 1.71 | — | Imported | 2026-05-27 |
| 34 | HULC (Train A, B, C -> Test D) | 0.67 | — | Imported | 2026-05-27 |
| 35 | Baseline (Train D -> Test D) | 0.64 | — | Imported | 2026-05-27 |
| 36 | Baseline (Train D -> Test D) | 0.41 | — | Imported | 2026-05-27 |
| 37 | Baseline (Train A, B, C, D -> Test D) | 0.4 | — | Imported | 2026-05-27 |
| 38 | Baseline (Train D -> Test D) | 0.33 | — | Imported | 2026-05-27 |
| 39 | Baseline (Train A, B, C -> Test D) | 0.31 | — | Imported | 2026-05-27 |
| 40 | Baseline (Train D -> Test D) | 0.31 | — | Imported | 2026-05-27 |
| 41 | Baseline (Train A, B, C, D -> Test D) | 0.28 | — | Imported | 2026-05-27 |
| 42 | Baseline (Train A, B, C -> Test D) | 0.26 | — | Imported | 2026-05-27 |
| 43 | Baseline (Train A, B, C, D -> Test D) | 0.25 | — | Imported | 2026-05-27 |
| 44 | Baseline (Train A, B, C -> Test D) | 0.22 | — | Imported | 2026-05-27 |
| 45 | Baseline (Train A, B, C -> Test D) | 0.2 | — | Imported | 2026-05-27 |
| 46 | Baseline (Train A, B, C, D -> Test D) | 0.16 | — | Imported | 2026-05-27 |
No matching rows.