Vibe Code Bench v1.1
Can models build web applications from scratch?
47rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 82.725% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Imported | 2026-05-28 |
| 2 | Claude Opus 4.7 | 71.003% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 3 | GPT 5.5 | 69.847% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 4 | GPT 5.4 2026-03-05 | 67.421% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 5 | GPT 5.3 Codex | 61.767% | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-28 |
| 6 | Claude Opus 4.6 | 57.573% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 7 | GPT 5.2 2025-12-11 | 53.499% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 8 | Claude Opus 4.6 Thinking | 53.498% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 9 | Claude Sonnet 4.6 | 51.476% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 10 | DeepSeek V4 Pro | 49.931% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-28 |
| 11 | Gemini 3.5 Flash | 48.683% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-28 |
| 12 | GPT 5.4 Mini 2026-03-17 | 47.969% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 13 | GPT 5.2 Codex | 37.912% | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-28 |
| 14 | Kimi K2.6 Thinking | 37.891% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 15 | Gemini 3.1 Pro Preview | 32.034% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 16 | GLM 5.1 Thinking | 31.456% | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-28 |
| 17 | GPT 5.4 Nano 2026-03-17 | 26.097% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
| 18 | Qwen 3.6 Plus | 25.565% | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 19 | GPT 5.1 2025-11-13 | 24.606% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-28 |
| 20 | GLM 5 Thinking | 23.359% | GLM 5 z-ai-glm-5 | Imported | 2026-05-28 |
| 21 | Claude Sonnet 4.5 20250929 Thinking | 22.621% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 22 | GPT 5.1 Codex Max | 22.168% | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-28 |
| 23 | Claude Opus 4.5 20251101 Thinking | 20.63% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 24 | Gemini 3 Flash Preview | 20.204% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 25 | GPT 5.2025-08-07 | 20.088% | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 26 | Muse Spark | 19.674% | — | Imported | 2026-05-28 |
| 27 | Grok 4.3 | 19.403% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-28 |
| 28 | Kimi K2.5 Thinking | 17.536% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 29 | Qwen 3.5 Plus Thinking | 15.738% | — | Imported | 2026-05-28 |
| 30 | MiniMax M2.5 | 14.853% | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-28 |
| 31 | Gemini 3 Pro Preview | 14.3% | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 32 | GPT 5 Mini 2025-08-07 | 14.171% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 33 | GPT 5.1 Codex | 13.115% | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-28 |
| 34 | Qwen 3.6 27B | 11.941% | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-28 |
| 35 | MiniMax M2.7 | 11.926% | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-28 |
| 36 | Qwen 3.7 Max | 11.418% | Qwen3.7 Max qwen-qwen3.7-max | Imported | 2026-05-28 |
| 37 | Claude Haiku 4.5 20251001 Thinking | 11.393% | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 38 | DeepSeek V3P2 Thinking | 5.108% | — | Imported | 2026-05-28 |
| 39 | Grok 4.20 0309 Reasoning | 4.063% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-28 |
| 40 | Qwen 3 Max | 3.506% | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-28 |
| 41 | GLM 4.6 | 3.09% | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-28 |
| 42 | Grok 4.1 Fast Reasoning | 1.2% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 43 | Gemini 2.5 Pro | 0.4% | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 44 | Command A Plus 05 2026 | 0% | — | Imported | 2026-05-28 |
| 45 | Gemini 3.1 Flash Lite Preview | 0% | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-28 |
| 46 | Grok 4 Fast Reasoning | 0% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 47 | Mistral Small 2603 | 0% | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-28 |
No matching rows.