LongBench v2

LongBench v2: Measures long-context retrieval, needle finding, summarization, factual grounding, or retrieval-augmented generation quality.

38rows
overall_cot_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Overall w/ CoT, Overall, Easy w/ CoT, Easy, Hard w/ CoT, Hard, Short w/ CoT, Short, Medium w/ CoT, Medium, Long w/ CoT, Long

Latest Results

Rows are parsed from the public LongBench v2 static HTML leaderboard. Primary score is Overall w/ CoT, matching the default leaderboard sorting note.

Rank Subject Overall w/ CoT Model Match Provenance Sampled
1 Gemini-2.5-Pro 63.3% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27
2 Gemini-2.5-Flash 62.1% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
3 Qwen3-235B-A22B-Thinking-2507 Alibaba 60.6% Imported 2026-05-27
4 DeepSeek-R1 58.3% R1
deepseek-r1
Imported 2026-05-27
5 Qwen3-235B-A22B-Instruct-2507 Alibaba 58.3% Imported 2026-05-27
6 o1-preview 57.7% o1-preview
openai-o1-preview
Imported 2026-05-27
7 DeepSeek-R1-0528 56.7% R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-27
8 MiniMax-Text-01 56.5% Imported 2026-05-27
9 Gemini-2.0-Flash-Thinking 56% Imported 2026-05-27
10 Human 53.7% Imported 2026-05-27
11 Gemini-Exp-1206 52.5% Imported 2026-05-27
12 GPT-4o 51.4% GPT-4o
openai-gpt-4o
Imported 2026-05-27
13 GPT-4o 51.2% GPT-4o
openai-gpt-4o
Imported 2026-05-27
14 Gemini-2.0-Flash 51.1% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-27
15 GLM-4.5 Z.ai & Tsinghua 50.3% Imported 2026-05-27
16 Qwen3-235B-A22B Alibaba 50.1% Imported 2026-05-27
17 Qwen3-30B-A3B-Thinking-2507 Alibaba 50.1% Imported 2026-05-27
18 Qwen3-32B Alibaba 49.2% Imported 2026-05-27
19 QwQ-32B Alibaba 48.9% Imported 2026-05-27
20 GLM-4.5-Air Z.ai & Tsinghua 48.6% Imported 2026-05-27
21 Claude 3.5 Sonnet Anthropic 46.7% Imported 2026-05-27
22 GLM-4-Plus Z.ai & Tsinghua 46.1% Imported 2026-05-27
23 Kimi-K2-Instruct Moonshot AI 44.3% Imported 2026-05-27
24 Qwen2.5-72B Alibaba 43.5% Imported 2026-05-27
25 Qwen3-30B-A3B Alibaba 42.5% Imported 2026-05-27
26 Mistral Large 24.11 Mistral AI 39.6% Imported 2026-05-27
27 o1-mini OpenAI 38.9% Imported 2026-05-27
28 Llama 3.1 70B Meta 36.2% Imported 2026-05-27
29 Llama 3.3 70B Meta 36.2% Imported 2026-05-27
30 Qwen2.5-7B Alibaba 35.6% Imported 2026-05-27
31 Nemotron 70B Nvidia 35.2% Imported 2026-05-27
32 Mistral Large 2 Mistral AI 33.6% Imported 2026-05-27
33 GPT-4o mini OpenAI 32.4% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
34 NExtLong 8B CAS 32% Imported 2026-05-27
35 Command R+ Cohere 31.6% Imported 2026-05-27
36 GLM-4-9B Z.ai & Tsinghua 30.8% Imported 2026-05-27
37 Llama 3.1 8B Meta 30.4% Imported 2026-05-27
38 Random 25% Imported 2026-05-27