LAMBADA

Language modeling benchmark for broad context understanding and last-word prediction.

5rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 falcon-180B 79.80 Imported 2026-05-06
2 Llama-2-70b-hf 78.90 Imported 2026-05-06
3 Llama 3.1 405B 77.70 Imported 2026-05-06
4 Llama-2-7b 76.50 Imported 2026-05-06
5 Qwen 3 235B 71.10 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06