LiveBench

A contamination-limited benchmark with frequently updated questions spanning math, coding, reasoning, language, instruction following, and data analysis.

50rows
livebench_averageprimary metric
2026-05-05sampled

Metadata

Metrics

LiveBench average, AMPS_Hard, code_completion, code_generation, connections, consecutive_events, integrals_with_game, javascript, logic_with_navigation, math_comp, olympiad, paraphrase, plot_unscrambling, python, simplify, spatial, story_generation, summarize, tablejoin, tablereformat, theory_of_mind, typescript, typos, zebra_puzzle

Latest Results

Average is computed as the unweighted mean of task columns in the official LiveBench CSV. Model display names are preserved from the source CSV.

Rank Subject LiveBench average Model Match Provenance Sampled
1 gpt-5.5-xhigh 81.28 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
2 gpt-5.4-xhigh 80.91 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
3 gemini-3.1-pro-preview-high 80.71 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-05
4 claude-opus-4-7-xhigh-effort 77.10 Imported 2026-05-05
5 gpt-5.5-high 77.07 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
6 claude-opus-4-6-thinking-auto-high-effort 76.79 Imported 2026-05-05
7 claude-opus-4-5-20251101-thinking-64k-high-effort 76.02 Imported 2026-05-05
8 claude-sonnet-4-6-thinking-auto-medium-effort 75.68 Imported 2026-05-05
9 gpt-5.4-high 75.60 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
10 claude-sonnet-4-6-thinking-auto-high-effort 75.59 Imported 2026-05-05
11 gpt-5.2-2025-12-11-high 75.38 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
12 claude-opus-4-7-high-effort 74.66 Imported 2026-05-05
13 deepseek-v4-pro 74.39 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-05
14 gpt-5.1-codex-max-high 74.36 GPT-5.1-Codex-Max
openai-gpt-5.1-codex-max
Imported 2026-05-05
15 gpt-5.2-codex 74.33 GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-05
16 claude-opus-4-5-20251101-thinking-64k-medium-effort 73.91 Imported 2026-05-05
17 gemini-3-pro-preview-11-2025-high 73.55 Imported 2026-05-05
18 gpt-5.3-codex-high 73.18 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-05
19 gemini-3-flash-preview-high 73.05 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
20 gpt-5.2-2025-12-11-medium 72.62 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
21 gpt-5.1-2025-11-13-high 72.61 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
22 gpt-5.1-codex-max 72.39 GPT-5.1-Codex-Max
openai-gpt-5.1-codex-max
Imported 2026-05-05
23 kimi-k2.6-thinking 72.39 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-05
24 claude-opus-4-7-medium-effort 72 Imported 2026-05-05
25 gpt-5.3-codex-xhigh 71.97 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-05
26 gpt-5.4-nano-xhigh 71.31 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
27 gpt-5-pro-2025-10-06 71.29 GPT-5 Pro
openai-gpt-5-pro
Imported 2026-05-05
28 qwen3.6-plus 70.77 Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-05
29 glm-5.1 70.62 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-05
30 claude-sonnet-4-6-thinking-auto-low-effort 70.19 Imported 2026-05-05
31 gpt-5.1-codex 69.31 GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-05
32 kimi-k2.5-thinking 69.16 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-05
33 gpt-5.1-2025-11-13-medium 69.14 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
34 grok-4.20-beta-0309-reasoning 68.99 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-05
35 gpt-5.5-medium 68.96 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
36 glm-5 68.70 GLM GLM 5
z-ai-glm-5
Imported 2026-05-05
37 claude-opus-4-7-low-effort 68.37 Imported 2026-05-05
38 claude-sonnet-4-5-20250929-thinking-64k 67.91 Imported 2026-05-05
39 gpt-5.4-mini-xhigh 67.74 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
40 deepseek-v4-flash 67.67 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-05
41 grok-4.3 67.37 GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-05
42 gpt-5-mini-high 66.60 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
43 gpt-5.2-2025-12-11-low 65.59 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
44 claude-opus-4-5-20251101-thinking-64k-low-effort 65.13 Imported 2026-05-05
45 minimax-m2.7 65 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-05
46 gpt-5.4-mini-high 63.65 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
47 gpt-5.4-nano-high 63.64 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
48 deepseek-v3.2-thinking 63.13 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-05
49 gemini-3-pro-preview-11-2025-low 62.89 Imported 2026-05-05
50 gemma-4-31b-it 62.38 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-05