LiveBench | BenchmarkList

Metadata

ID: livebench
Category: Intelligence
Release: 2024-06-27
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

LiveBench average, AMPS_Hard, code_completion, code_generation, connections, consecutive_events, integrals_with_game, javascript, logic_with_navigation, math_comp, olympiad, paraphrase, plot_unscrambling, python, simplify, spatial, story_generation, summarize, tablejoin, tablereformat, theory_of_mind, typescript, typos, zebra_puzzle

Rank	Subject	LiveBench average	Model Match	Provenance	Sampled
1	gpt-5.5-xhigh	81.28	GPT-5.5 openai-gpt-5.5	Imported	2026-05-05
2	gpt-5.4-xhigh	80.91	GPT-5.4 openai-gpt-5.4	Imported	2026-05-05
3	gemini-3.1-pro-preview-high	80.71	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-05
4	claude-opus-4-7-xhigh-effort	77.10	—	Imported	2026-05-05
5	gpt-5.5-high	77.07	GPT-5.5 openai-gpt-5.5	Imported	2026-05-05
6	claude-opus-4-6-thinking-auto-high-effort	76.79	—	Imported	2026-05-05
7	claude-opus-4-5-20251101-thinking-64k-high-effort	76.02	—	Imported	2026-05-05
8	claude-sonnet-4-6-thinking-auto-medium-effort	75.68	—	Imported	2026-05-05
9	gpt-5.4-high	75.60	GPT-5.4 openai-gpt-5.4	Imported	2026-05-05
10	claude-sonnet-4-6-thinking-auto-high-effort	75.59	—	Imported	2026-05-05
11	gpt-5.2-2025-12-11-high	75.38	GPT-5.2 openai-gpt-5.2	Imported	2026-05-05
12	claude-opus-4-7-high-effort	74.66	—	Imported	2026-05-05
13	deepseek-v4-pro	74.39	DeepSeek V4 Pro deepseek-deepseek-v4-pro	Imported	2026-05-05
14	gpt-5.1-codex-max-high	74.36	GPT-5.1-Codex-Max openai-gpt-5.1-codex-max	Imported	2026-05-05
15	gpt-5.2-codex	74.33	GPT-5.2-Codex openai-gpt-5.2-codex	Imported	2026-05-05
16	claude-opus-4-5-20251101-thinking-64k-medium-effort	73.91	—	Imported	2026-05-05
17	gemini-3-pro-preview-11-2025-high	73.55	—	Imported	2026-05-05
18	gpt-5.3-codex-high	73.18	GPT-5.3-Codex openai-gpt-5.3-codex	Imported	2026-05-05
19	gemini-3-flash-preview-high	73.05	Gemini 3 Flash Preview google-gemini-3-flash-preview	Imported	2026-05-05
20	gpt-5.2-2025-12-11-medium	72.62	GPT-5.2 openai-gpt-5.2	Imported	2026-05-05
21	gpt-5.1-2025-11-13-high	72.61	GPT-5.1 openai-gpt-5.1	Imported	2026-05-05
22	gpt-5.1-codex-max	72.39	GPT-5.1-Codex-Max openai-gpt-5.1-codex-max	Imported	2026-05-05
23	kimi-k2.6-thinking	72.39	KIMI MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6	Imported	2026-05-05
24	claude-opus-4-7-medium-effort	72	—	Imported	2026-05-05
25	gpt-5.3-codex-xhigh	71.97	GPT-5.3-Codex openai-gpt-5.3-codex	Imported	2026-05-05
26	gpt-5.4-nano-xhigh	71.31	GPT-5.4 Nano openai-gpt-5.4-nano	Imported	2026-05-05
27	gpt-5-pro-2025-10-06	71.29	GPT-5 Pro openai-gpt-5-pro	Imported	2026-05-05
28	qwen3.6-plus	70.77	Qwen3.6 Plus qwen-qwen3.6-plus	Imported	2026-05-05
29	glm-5.1	70.62	GLM GLM 5.1 z-ai-glm-5.1	Imported	2026-05-05
30	claude-sonnet-4-6-thinking-auto-low-effort	70.19	—	Imported	2026-05-05
31	gpt-5.1-codex	69.31	GPT-5.1-Codex openai-gpt-5.1-codex	Imported	2026-05-05
32	kimi-k2.5-thinking	69.16	KIMI MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5	Imported	2026-05-05
33	gpt-5.1-2025-11-13-medium	69.14	GPT-5.1 openai-gpt-5.1	Imported	2026-05-05
34	grok-4.20-beta-0309-reasoning	68.99	GROK Grok 4.20 x-ai-grok-4.20	Imported	2026-05-05
35	gpt-5.5-medium	68.96	GPT-5.5 openai-gpt-5.5	Imported	2026-05-05
36	glm-5	68.70	GLM GLM 5 z-ai-glm-5	Imported	2026-05-05
37	claude-opus-4-7-low-effort	68.37	—	Imported	2026-05-05
38	claude-sonnet-4-5-20250929-thinking-64k	67.91	—	Imported	2026-05-05
39	gpt-5.4-mini-xhigh	67.74	GPT-5.4 Mini openai-gpt-5.4-mini	Imported	2026-05-05
40	deepseek-v4-flash	67.67	DeepSeek V4 Flash deepseek-deepseek-v4-flash	Imported	2026-05-05
41	grok-4.3	67.37	GROK Grok 4.3 x-ai-grok-4.3	Imported	2026-05-05
42	gpt-5-mini-high	66.60	GPT-5 Mini openai-gpt-5-mini	Imported	2026-05-05
43	gpt-5.2-2025-12-11-low	65.59	GPT-5.2 openai-gpt-5.2	Imported	2026-05-05
44	claude-opus-4-5-20251101-thinking-64k-low-effort	65.13	—	Imported	2026-05-05
45	minimax-m2.7	65	MiniMax M2.7 minimax-minimax-m2.7	Imported	2026-05-05
46	gpt-5.4-mini-high	63.65	GPT-5.4 Mini openai-gpt-5.4-mini	Imported	2026-05-05
47	gpt-5.4-nano-high	63.64	GPT-5.4 Nano openai-gpt-5.4-nano	Imported	2026-05-05
48	deepseek-v3.2-thinking	63.13	DeepSeek V3.2 deepseek-deepseek-v3.2	Imported	2026-05-05
49	gemini-3-pro-preview-11-2025-low	62.89	—	Imported	2026-05-05
50	gemma-4-31b-it	62.38	Gemma 4 31B google-gemma-4-31b-it	Imported	2026-05-05

Metadata

Metrics

Latest Results