LiveBench
A contamination-limited benchmark with frequently updated questions spanning math, coding, reasoning, language, instruction following, and data analysis.
50rows
livebench_averageprimary metric
2026-05-05sampled
Metadata
Metrics
LiveBench average, AMPS_Hard, code_completion, code_generation, connections, consecutive_events, integrals_with_game, javascript, logic_with_navigation, math_comp, olympiad, paraphrase, plot_unscrambling, python, simplify, spatial, story_generation, summarize, tablejoin, tablereformat, theory_of_mind, typescript, typos, zebra_puzzle
| Rank | Subject | LiveBench average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-5.5-xhigh | 81.28 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 2 | gpt-5.4-xhigh | 80.91 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 3 | gemini-3.1-pro-preview-high | 80.71 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-05 |
| 4 | claude-opus-4-7-xhigh-effort | 77.10 | — | Imported | 2026-05-05 |
| 5 | gpt-5.5-high | 77.07 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 6 | claude-opus-4-6-thinking-auto-high-effort | 76.79 | — | Imported | 2026-05-05 |
| 7 | claude-opus-4-5-20251101-thinking-64k-high-effort | 76.02 | — | Imported | 2026-05-05 |
| 8 | claude-sonnet-4-6-thinking-auto-medium-effort | 75.68 | — | Imported | 2026-05-05 |
| 9 | gpt-5.4-high | 75.60 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 10 | claude-sonnet-4-6-thinking-auto-high-effort | 75.59 | — | Imported | 2026-05-05 |
| 11 | gpt-5.2-2025-12-11-high | 75.38 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 12 | claude-opus-4-7-high-effort | 74.66 | — | Imported | 2026-05-05 |
| 13 | deepseek-v4-pro | 74.39 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-05 |
| 14 | gpt-5.1-codex-max-high | 74.36 | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-05 |
| 15 | gpt-5.2-codex | 74.33 | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-05 |
| 16 | claude-opus-4-5-20251101-thinking-64k-medium-effort | 73.91 | — | Imported | 2026-05-05 |
| 17 | gemini-3-pro-preview-11-2025-high | 73.55 | — | Imported | 2026-05-05 |
| 18 | gpt-5.3-codex-high | 73.18 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-05 |
| 19 | gemini-3-flash-preview-high | 73.05 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 20 | gpt-5.2-2025-12-11-medium | 72.62 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 21 | gpt-5.1-2025-11-13-high | 72.61 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 22 | gpt-5.1-codex-max | 72.39 | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-05 |
| 23 | kimi-k2.6-thinking | 72.39 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-05 |
| 24 | claude-opus-4-7-medium-effort | 72 | — | Imported | 2026-05-05 |
| 25 | gpt-5.3-codex-xhigh | 71.97 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-05 |
| 26 | gpt-5.4-nano-xhigh | 71.31 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 27 | gpt-5-pro-2025-10-06 | 71.29 | GPT-5 Pro openai-gpt-5-pro | Imported | 2026-05-05 |
| 28 | qwen3.6-plus | 70.77 | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-05 |
| 29 | glm-5.1 | 70.62 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-05 |
| 30 | claude-sonnet-4-6-thinking-auto-low-effort | 70.19 | — | Imported | 2026-05-05 |
| 31 | gpt-5.1-codex | 69.31 | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-05 |
| 32 | kimi-k2.5-thinking | 69.16 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-05 |
| 33 | gpt-5.1-2025-11-13-medium | 69.14 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 34 | grok-4.20-beta-0309-reasoning | 68.99 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-05 |
| 35 | gpt-5.5-medium | 68.96 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 36 | glm-5 | 68.70 | GLM 5 z-ai-glm-5 | Imported | 2026-05-05 |
| 37 | claude-opus-4-7-low-effort | 68.37 | — | Imported | 2026-05-05 |
| 38 | claude-sonnet-4-5-20250929-thinking-64k | 67.91 | — | Imported | 2026-05-05 |
| 39 | gpt-5.4-mini-xhigh | 67.74 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 40 | deepseek-v4-flash | 67.67 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-05 |
| 41 | grok-4.3 | 67.37 | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-05 |
| 42 | gpt-5-mini-high | 66.60 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 43 | gpt-5.2-2025-12-11-low | 65.59 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 44 | claude-opus-4-5-20251101-thinking-64k-low-effort | 65.13 | — | Imported | 2026-05-05 |
| 45 | minimax-m2.7 | 65 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-05 |
| 46 | gpt-5.4-mini-high | 63.65 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 47 | gpt-5.4-nano-high | 63.64 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 48 | deepseek-v3.2-thinking | 63.13 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-05 |
| 49 | gemini-3-pro-preview-11-2025-low | 62.89 | — | Imported | 2026-05-05 |
| 50 | gemma-4-31b-it | 62.38 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-05 |
No matching rows.