BABILong

BABILong evaluates long-context language models on synthetic QA tasks over increasingly long contexts, reporting accuracy by context length.

30rows
mean_le_128kprimary metric
2026-05-06sampled

Metadata

Metrics

Mean <=128k, Mean <=32k, 0k, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 512k, 1M, 10M

Latest Results

Snapshot uses the public BABILong leaderboard average task rows. Rank follows the Space's sorting intent: longest evaluated context first, then mean score up to 128k tokens.

Rank Subject Mean <=128k Model Match Provenance Sampled
1 ~ ARMT (137M) fine-tune 97.78 Imported 2026-05-06
2 ~ Mamba (130M) fine-tune 97.67 Imported 2026-05-06
3 ~ RMT (137M) fine-tune 85.06 Imported 2026-05-06
4 Meta-Llama-3.1-70B-Instruct 67.78 Imported 2026-05-06
5 gpt-4-0125-preview 65.11 GPT-4 Turbo Preview
openai-gpt-4-turbo-preview
Imported 2026-05-06
6 Meta-Llama-3.1-8B-Instruct 59.22 Imported 2026-05-06
7 Phi-3-medium-128k-instruct 57.33 Imported 2026-05-06
8 c4ai-command-r-v01 55.33 Imported 2026-05-06
9 Mixtral-8x22B-Instruct-v0.1 52.22 Imported 2026-05-06
10 01-ai/Yi-34B-200k 48 Imported 2026-05-06
11 ai21labs/Jamba-v0.1 46.89 Imported 2026-05-06
12 Llama3-ChatQA-1.5-8B + RAG 45.56 Imported 2026-05-06
13 Phi-3-mini-128k-instruct 45.44 Imported 2026-05-06
14 Mixtral-8x7B-Instruct-v0.1 42.11 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
15 01-ai/Yi-9B-200k 41.33 Imported 2026-05-06
16 activation-beacon-mistral-7b 41.22 Imported 2026-05-06
17 chatglm3-6b-128k 40.78 Imported 2026-05-06
18 Mistral-7b-Instruct-v0.2 37.89 Imported 2026-05-06
19 Yarn-Mistral-7b-128k 32.11 Imported 2026-05-06
20 activation-beacon-llama2-7b-chat 31.78 Imported 2026-05-06
21 01-ai/Yi-34B 30.78 Imported 2026-05-06
22 Meta-Llama-3-8B-Instruct 30.67 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-06
23 Llama-2-7B-32K-Instruct 30.33 Imported 2026-05-06
24 LongAlpaca-13B 29.33 Imported 2026-05-06
25 longchat-7b-v1.5-32k 28.33 Imported 2026-05-06
26 LLaMA-2-7B-32K 28.11 Imported 2026-05-06
27 v5-Eagle-7B-HF 23 Imported 2026-05-06
28 rwkv-6-world-7b 22.33 Imported 2026-05-06
29 mamba-2.8b-hf 18.44 Imported 2026-05-06
30 GPT-2 (137M) 4.67 Imported 2026-05-06