LiveCodeBench
Our Implementation of the LiveCodeBench benchmark
113rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 88.485% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 2 | GPT 5.2 Codex | 87.993% | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-28 |
| 3 | Claude Opus 4.8 | 87.819% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Imported | 2026-05-28 |
| 4 | Gemini 3.5 Flash | 87.604% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-28 |
| 5 | DeepSeek V4 Pro | 87.484% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-28 |
| 6 | GPT 5.3 Codex | 87.313% | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-28 |
| 7 | Qwen 3.7 Max | 87.057% | Qwen3.7 Max qwen-qwen3.7-max | Imported | 2026-05-28 |
| 8 | Kimi K2.6 Thinking | 86.771% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 9 | GPT 5 Mini 2025-08-07 | 86.605% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 10 | GPT 5.1 2025-11-13 | 86.486% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-28 |
| 11 | Gemini 3 Pro Preview | 86.407% | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 12 | Qwen 3.6 Plus | 85.952% | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 13 | GPT 5.2025-08-07 | 85.911% | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 14 | Gemini 3 Flash Preview | 85.591% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 15 | GPT 5.1 Codex | 85.55% | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-28 |
| 16 | GPT 5.2 2025-12-11 | 85.361% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 17 | Qwen 3.5 Plus Thinking | 85.326% | — | Imported | 2026-05-28 |
| 18 | GPT 5.5 | 85.296% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 19 | Claude Opus 4.7 | 85.073% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 20 | GPT 5 Codex | 84.725% | GPT-5 Codex openai-gpt-5-codex | Imported | 2026-05-28 |
| 21 | Claude Opus 4.6 Thinking | 84.676% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 22 | Grok 4.3 | 84.494% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-28 |
| 23 | Grok 4.20 0309 Reasoning | 84.265% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-28 |
| 24 | GPT 5.4 2026-03-05 | 84.141% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 25 | GPT 5.4 Nano 2026-03-17 | 84.009% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
| 26 | O3 2025-04-16 | 83.914% | o3 openai-o3 | Imported | 2026-05-28 |
| 27 | Kimi K2.5 Thinking | 83.868% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 28 | Claude Opus 4.5 20251101 Thinking | 83.67% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 29 | GPT 5.1 Codex Max | 83.558% | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-28 |
| 30 | Qwen 3.5 Flash | 83.28% | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-28 |
| 31 | Grok 4.0709 | 83.247% | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 32 | GPT Oss 120B | 83.234% | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 33 | GLM 4.7 | 82.234% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-28 |
| 34 | O4 Mini 2025-04-16 | 82.208% | o4 Mini openai-o4-mini | Imported | 2026-05-28 |
| 35 | Claude Sonnet 4.6 | 82.091% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 36 | GLM 5 Thinking | 81.868% | GLM 5 z-ai-glm-5 | Imported | 2026-05-28 |
| 37 | MiniMax M2.1 | 81.756% | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-28 |
| 38 | GPT 5.4 Mini 2026-03-17 | 81.465% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 39 | GLM 5.1 Thinking | 81.38% | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-28 |
| 40 | GLM 4.6 | 81.036% | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-28 |
| 41 | DeepSeek V3P2 Thinking | 80.695% | — | Imported | 2026-05-28 |
| 42 | Grok 4.1 Fast Reasoning | 80.641% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 43 | GPT Oss 20B | 80.387% | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-28 |
| 44 | Gemini 3.1 Flash Lite Preview | 80.116% | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-28 |
| 45 | MiniMax M2.7 | 79.926% | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-28 |
| 46 | Command A Plus 05 2026 | 79.382% | — | Imported | 2026-05-28 |
| 47 | MiniMax M2.5 Lightning | 79.208% | — | Imported | 2026-05-28 |
| 48 | Gemini 2.5 Pro Preview 03 25 | 79.164% | Gemini 2.5 Pro Preview 05-06 google-gemini-2.5-pro-preview-05-06 | Imported | 2026-05-28 |
| 49 | Grok 4 Fast Reasoning | 78.973% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 50 | Qwen 3 Max | 78.215% | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-28 |
| 51 | Grok 3 Mini Fast High Reasoning | 76.22% | — | Imported | 2026-05-28 |
| 52 | Gemini 2.5 Flash Preview 09 2025 Thinking | 76.214% | — | Imported | 2026-05-28 |
| 53 | Gemini 2.5 Flash Preview 09 2025 | 75.063% | — | Imported | 2026-05-28 |
| 54 | Claude Opus 4.5 20251101 | 75.034% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 55 | Magistral Medium 2509 | 74.86% | — | Imported | 2026-05-28 |
| 56 | Claude Sonnet 4.5 20250929 Thinking | 72.996% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 57 | Magistral Small 2509 | 72.131% | — | Imported | 2026-05-28 |
| 58 | O3 Mini 2025-01-31 | 71.484% | o3-mini openai-o3-mini | Imported | 2026-05-28 |
| 59 | Gemini 2.5 Flash Lite Preview 09 2025 Thinking | 71.385% | — | Imported | 2026-05-28 |
| 60 | Qwen 3 235B A22b | 70.62% | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-28 |
| 61 | Kimi K2 Instruct | 70.449% | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-28 |
| 62 | DeepSeek R1 | 70.221% | R1 deepseek-r1 | Imported | 2026-05-28 |
| 63 | GPT 5 Nano 2025-08-07 | 70.216% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
| 64 | Claude Opus 4.20250514 Thinking | 70.188% | — | Imported | 2026-05-28 |
| 65 | DeepSeek V3P2 | 69.856% | — | Imported | 2026-05-28 |
| 66 | Gemini 2.5 Flash Lite Preview 09 2025 | 67.669% | Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025 | Imported | 2026-05-28 |
| 67 | GLM 4.5 | 67.446% | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-28 |
| 68 | Qwen 3 Max Preview | 66.91% | — | Imported | 2026-05-28 |
| 69 | Claude Opus 4.1 20250805 Thinking | 66.456% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 70 | Grok 3 Mini Fast Low Reasoning | 66.265% | — | Imported | 2026-05-28 |
| 71 | DeepSeek V3 0324 | 65.478% | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-28 |
| 72 | Claude Opus 4.1 20250805 | 64.559% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 73 | Kimi K2 Thinking | 63.145% | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Imported | 2026-05-28 |
| 74 | Claude Opus 4.20250514 | 62.629% | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-28 |
| 75 | Claude Sonnet 4.20250514 Thinking | 62.392% | — | Imported | 2026-05-28 |
| 76 | Grok Code Fast 1 | 61.969% | Grok Code Fast 1 x-ai-grok-code-fast-1 | Imported | 2026-05-28 |
| 77 | Claude 3 7 Sonnet 20250219 Thinking | 60.436% | — | Imported | 2026-05-28 |
| 78 | Claude Sonnet 4.20250514 | 59.673% | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-28 |
| 79 | Llama 3.3 Nemotron Super 49B V1 42e84561 Thinking | 58.369% | — | Imported | 2026-05-28 |
| 80 | GPT 4.1 Mini 2025-04-14 | 58.158% | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-28 |
| 81 | Gemini 2.5 Flash Preview 04 17 | 56.936% | — | Imported | 2026-05-28 |
| 82 | Claude 3 7 Sonnet 20250219 | 56.662% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-28 |
| 83 | Mistral Large 2512 | 55.337% | Mistral: Mistral Large 3 2512 mistralai-mistral-large-2512 | Imported | 2026-05-28 |
| 84 | GPT 4.1 2025-04-14 | 54.666% | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-28 |
| 85 | Grok 3 | 52.901% | Grok 3 xaigrok-3 | Imported | 2026-05-28 |
| 86 | Devstral 2512 | 51.841% | Mistral: Devstral 2 2512 mistralai-devstral-2512 | Imported | 2026-05-28 |
| 87 | O1 2024-12-17 | 50.264% | o1 openai-o1 | Imported | 2026-05-28 |
| 88 | Claude 3 5 Sonnet 20241022 | 49.628% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-28 |
| 89 | Llama4 Maverick Instruct Basic | 47.251% | — | Imported | 2026-05-28 |
| 90 | Gemini 2.5 Flash Preview 04 17 Thinking | 46.871% | — | Imported | 2026-05-28 |
| 91 | Grok 4 Fast Non Reasoning | 46.095% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 92 | Mistral Medium 2505 | 44.845% | — | Imported | 2026-05-28 |
| 93 | Gemini 2.0 Flash 001 | 43.608% | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-28 |
| 94 | GPT 4O 2024-11-20 | 43.444% | GPT-4o (2024-11-20) openai-gpt-4o-2024-11-20 | Imported | 2026-05-28 |
| 95 | Labs Devstral Small 2512 | 43.178% | — | Imported | 2026-05-28 |
| 96 | GPT 4.1 Nano 2025-04-14 | 42.718% | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-28 |
| 97 | Grok 4.1 Fast Non Reasoning | 42.622% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 98 | Claude 3 5 Haiku 20241022 | 41.918% | — | Imported | 2026-05-28 |
| 99 | Gemini 1.5 Pro 002 | 41.719% | — | Imported | 2026-05-28 |
| 100 | Claude Haiku 4.5 20251001 Thinking | 41.175% | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 101 | Grok 2.1212 | 38.679% | — | Imported | 2026-05-28 |
| 102 | Llama 4 Scout 17B 16E Instruct | 38.541% | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-28 |
| 103 | Mistral Large 2411 | 37.088% | Mistral Large 2411 mistralai-mistral-large-2411 | Imported | 2026-05-28 |
| 104 | Gemini 1.5 Flash 002 | 36.91% | — | Imported | 2026-05-28 |
| 105 | Llama 3.3 70B Instruct Turbo | 36.341% | — | Imported | 2026-05-28 |
| 106 | Llama 3.3 Nemotron Super 49B V1 42e84561 | 36.308% | — | Imported | 2026-05-28 |
| 107 | Command A 03 2025 | 35.071% | Command A cohere-command-a | Imported | 2026-05-28 |
| 108 | Mistral Small 2503 | 31.815% | — | Imported | 2026-05-28 |
| 109 | GPT 4O Mini 2024-07-18 | 26.423% | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-28 |
| 110 | Jamba Large 1.6 | 22.325% | — | Imported | 2026-05-28 |
| 111 | Command R Plus | 18.238% | — | Imported | 2026-05-28 |
| 112 | Mistral Small 2402 | 15.781% | — | Imported | 2026-05-28 |
| 113 | Jamba Mini 1.6 | 9.918% | — | Imported | 2026-05-28 |
No matching rows.