CALVIN

Language-conditioned robot manipulation benchmark for long-horizon sequences and multitask learning in tabletop environments.

46rows
lh_mtlc_avg_lenprimary metric
2026-05-27sampled

Metadata

Metrics

MTLC success rate, LH-MTLC 1 instruction, LH-MTLC 2 instructions, LH-MTLC 3 instructions, LH-MTLC 4 instructions, LH-MTLC 5 instructions, LH-MTLC average length

Latest Results

Rows are parsed from CALVIN's public homepage DataTables for Train D -> Test D, Train A/B/C/D -> Test D, and Train A/B/C -> Test D. Score is LH-MTLC average length.

Rank Subject LH-MTLC average length Model Match Provenance Sampled
1 FLOWER (Train A, B, C, D -> Test D) 4.67 Imported 2026-05-27
2 UniVLA (Train A, B, C, D -> Test D) 4.63 Imported 2026-05-27
3 FLOWER (Train A, B, C -> Test D) 4.53 Imported 2026-05-27
4 MDT (Train A, B, C, D -> Test D) 4.52 Imported 2026-05-27
5 UniVLA (Train A, B, C -> Test D) 4.41 Imported 2026-05-27
6 MoDE (Train A, B, C, D -> Test D) 4.39 Imported 2026-05-27
7 FLOWER (Train D -> Test D) 4.35 Imported 2026-05-27
8 SeeR-Large (Train A, B, C -> Test D) 4.28 Imported 2026-05-27
9 GR-1 (Train A, B, C, D -> Test D) 4.21 Imported 2026-05-27
10 DeeR (Train A, B, C, D -> Test D) 4.13 Imported 2026-05-27
11 RoboFlamingo (Train A, B, C, D -> Test D) 4.08 Imported 2026-05-27
12 GR-MG (Train A, B, C -> Test D) 4.04 Imported 2026-05-27
13 MoDE (Train A, B, C -> Test D) 4.01 Imported 2026-05-27
14 RoboUniView (Train D -> Test D) 3.85 Imported 2026-05-27
15 MDT (Train D -> Test D) 3.72 Imported 2026-05-27
16 GHIL-Glue (Train A, B, C -> Test D) 3.69 Imported 2026-05-27
17 RoboUniView (Train A, B, C -> Test D) 3.64 Imported 2026-05-27
18 Diffusion Transformer Policy (Train A, B, C -> Test D) 3.61 Imported 2026-05-27
19 CLOVER (Train A, B, C -> Test D) 3.53 Imported 2026-05-27
20 HULC++ (Train D -> Test D) 3.3 Imported 2026-05-27
21 3D Diffuser Actor (Train A, B, C -> Test D) 3.27 Imported 2026-05-27
22 TaKSIE (Train D -> Test D) 3.18 Imported 2026-05-27
23 GR-1 (Train A, B, C -> Test D) 3.06 Imported 2026-05-27
24 HULC (Train A, B, C, D -> Test D) 3.06 Imported 2026-05-27
25 LCD (Train D -> Test D) 2.88 Imported 2026-05-27
26 DeeR (Train D -> Test D) 2.83 Imported 2026-05-27
27 DeeR (Train A, B, C -> Test D) 2.82 Imported 2026-05-27
28 SuSIE (Train A, B, C -> Test D) 2.69 Imported 2026-05-27
29 SPIL (Train D -> Test D) 2.67 Imported 2026-05-27
30 HULC (Train D -> Test D) 2.64 Imported 2026-05-27
31 RoboFlamingo (Train A, B, C -> Test D) 2.47 Imported 2026-05-27
32 Baseline + delta actions (Train D -> Test D) 1.82 Imported 2026-05-27
33 SPIL (Train A, B, C -> Test D) 1.71 Imported 2026-05-27
34 HULC (Train A, B, C -> Test D) 0.67 Imported 2026-05-27
35 Baseline (Train D -> Test D) 0.64 Imported 2026-05-27
36 Baseline (Train D -> Test D) 0.41 Imported 2026-05-27
37 Baseline (Train A, B, C, D -> Test D) 0.4 Imported 2026-05-27
38 Baseline (Train D -> Test D) 0.33 Imported 2026-05-27
39 Baseline (Train A, B, C -> Test D) 0.31 Imported 2026-05-27
40 Baseline (Train D -> Test D) 0.31 Imported 2026-05-27
41 Baseline (Train A, B, C, D -> Test D) 0.28 Imported 2026-05-27
42 Baseline (Train A, B, C -> Test D) 0.26 Imported 2026-05-27
43 Baseline (Train A, B, C, D -> Test D) 0.25 Imported 2026-05-27
44 Baseline (Train A, B, C -> Test D) 0.22 Imported 2026-05-27
45 Baseline (Train A, B, C -> Test D) 0.2 Imported 2026-05-27
46 Baseline (Train A, B, C, D -> Test D) 0.16 Imported 2026-05-27