Vibe Code Bench v1.1

Can models build web applications from scratch?

47rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.8 82.725% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
2 Claude Opus 4.7 71.003% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
3 GPT 5.5 69.847% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
4 GPT 5.4 2026-03-05 67.421% GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
5 GPT 5.3 Codex 61.767% GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-28
6 Claude Opus 4.6 57.573% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
7 GPT 5.2 2025-12-11 53.499% GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
8 Claude Opus 4.6 Thinking 53.498% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
9 Claude Sonnet 4.6 51.476% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28
10 DeepSeek V4 Pro 49.931% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-28
11 Gemini 3.5 Flash 48.683% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
12 GPT 5.4 Mini 2026-03-17 47.969% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-28
13 GPT 5.2 Codex 37.912% GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-28
14 Kimi K2.6 Thinking 37.891% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
15 Gemini 3.1 Pro Preview 32.034% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
16 GLM 5.1 Thinking 31.456% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-28
17 GPT 5.4 Nano 2026-03-17 26.097% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
18 Qwen 3.6 Plus 25.565% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
19 GPT 5.1 2025-11-13 24.606% GPT-5.1
openai-gpt-5.1
Imported 2026-05-28
20 GLM 5 Thinking 23.359% GLM GLM 5
z-ai-glm-5
Imported 2026-05-28
21 Claude Sonnet 4.5 20250929 Thinking 22.621% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
22 GPT 5.1 Codex Max 22.168% GPT-5.1-Codex-Max
openai-gpt-5.1-codex-max
Imported 2026-05-28
23 Claude Opus 4.5 20251101 Thinking 20.63% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
24 Gemini 3 Flash Preview 20.204% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
25 GPT 5.2025-08-07 20.088% GPT-5
openai-gpt-5
Imported 2026-05-28
26 Muse Spark 19.674% Imported 2026-05-28
27 Grok 4.3 19.403% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
28 Kimi K2.5 Thinking 17.536% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-28
29 Qwen 3.5 Plus Thinking 15.738% Imported 2026-05-28
30 MiniMax M2.5 14.853% MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-28
31 Gemini 3 Pro Preview 14.3% Gemini 3
google-gemini-3
Imported 2026-05-28
32 GPT 5 Mini 2025-08-07 14.171% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
33 GPT 5.1 Codex 13.115% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-28
34 Qwen 3.6 27B 11.941% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-28
35 MiniMax M2.7 11.926% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-28
36 Qwen 3.7 Max 11.418% Qwen3.7 Max
qwen-qwen3.7-max
Imported 2026-05-28
37 Claude Haiku 4.5 20251001 Thinking 11.393% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
38 DeepSeek V3P2 Thinking 5.108% Imported 2026-05-28
39 Grok 4.20 0309 Reasoning 4.063% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
40 Qwen 3 Max 3.506% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-28
41 GLM 4.6 3.09% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-28
42 Grok 4.1 Fast Reasoning 1.2% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
43 Gemini 2.5 Pro 0.4% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
44 Command A Plus 05 2026 0% Imported 2026-05-28
45 Gemini 3.1 Flash Lite Preview 0% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
46 Grok 4 Fast Reasoning 0% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
47 Mistral Small 2603 0% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-28