BABILong
BABILong evaluates long-context language models on synthetic QA tasks over increasingly long contexts, reporting accuracy by context length.
30rows
mean_le_128kprimary metric
2026-05-06sampled
Metadata
Metrics
Mean <=128k, Mean <=32k, 0k, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 512k, 1M, 10M
| Rank | Subject | Mean <=128k | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | ~ ARMT (137M) fine-tune | 97.78 | — | Imported | 2026-05-06 |
| 2 | ~ Mamba (130M) fine-tune | 97.67 | — | Imported | 2026-05-06 |
| 3 | ~ RMT (137M) fine-tune | 85.06 | — | Imported | 2026-05-06 |
| 4 | Meta-Llama-3.1-70B-Instruct | 67.78 | — | Imported | 2026-05-06 |
| 5 | gpt-4-0125-preview | 65.11 | GPT-4 Turbo Preview openai-gpt-4-turbo-preview | Imported | 2026-05-06 |
| 6 | Meta-Llama-3.1-8B-Instruct | 59.22 | — | Imported | 2026-05-06 |
| 7 | Phi-3-medium-128k-instruct | 57.33 | — | Imported | 2026-05-06 |
| 8 | c4ai-command-r-v01 | 55.33 | — | Imported | 2026-05-06 |
| 9 | Mixtral-8x22B-Instruct-v0.1 | 52.22 | — | Imported | 2026-05-06 |
| 10 | 01-ai/Yi-34B-200k | 48 | — | Imported | 2026-05-06 |
| 11 | ai21labs/Jamba-v0.1 | 46.89 | — | Imported | 2026-05-06 |
| 12 | Llama3-ChatQA-1.5-8B + RAG | 45.56 | — | Imported | 2026-05-06 |
| 13 | Phi-3-mini-128k-instruct | 45.44 | — | Imported | 2026-05-06 |
| 14 | Mixtral-8x7B-Instruct-v0.1 | 42.11 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 15 | 01-ai/Yi-9B-200k | 41.33 | — | Imported | 2026-05-06 |
| 16 | activation-beacon-mistral-7b | 41.22 | — | Imported | 2026-05-06 |
| 17 | chatglm3-6b-128k | 40.78 | — | Imported | 2026-05-06 |
| 18 | Mistral-7b-Instruct-v0.2 | 37.89 | — | Imported | 2026-05-06 |
| 19 | Yarn-Mistral-7b-128k | 32.11 | — | Imported | 2026-05-06 |
| 20 | activation-beacon-llama2-7b-chat | 31.78 | — | Imported | 2026-05-06 |
| 21 | 01-ai/Yi-34B | 30.78 | — | Imported | 2026-05-06 |
| 22 | Meta-Llama-3-8B-Instruct | 30.67 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 23 | Llama-2-7B-32K-Instruct | 30.33 | — | Imported | 2026-05-06 |
| 24 | LongAlpaca-13B | 29.33 | — | Imported | 2026-05-06 |
| 25 | longchat-7b-v1.5-32k | 28.33 | — | Imported | 2026-05-06 |
| 26 | LLaMA-2-7B-32K | 28.11 | — | Imported | 2026-05-06 |
| 27 | v5-Eagle-7B-HF | 23 | — | Imported | 2026-05-06 |
| 28 | rwkv-6-world-7b | 22.33 | — | Imported | 2026-05-06 |
| 29 | mamba-2.8b-hf | 18.44 | — | Imported | 2026-05-06 |
| 30 | GPT-2 (137M) | 4.67 | — | Imported | 2026-05-06 |
No matching rows.