VibeCodingBench

Production-oriented coding benchmark evaluating AI coding agents across functional correctness, visual fidelity, code quality, security, cost, and speed on representative developer tasks.

15rows
avg_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Avg Score, Pass Rate, Tasks Completed, Functional, Visual, Quality, Security, Cost Score, Speed Score, Total Cost (lower is better), Avg Time (lower is better), Total Tokens (lower is better)

Latest Results

Rows are imported from the public VibeCodingBench dashboard JSON files. Source model display names are preserved.

Rank Subject Avg Score Model Match Provenance Sampled
1 Claude Opus 4.5 89.15 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
2 Claude Haiku 4.5 88.97 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
3 Grok 4 Fast 88.80 GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-06
4 OpenAI GPT-5.2 88.75 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
5 Qwen3 Max 88.60 Qwen3 Max
qwen-qwen3-max
Imported 2026-05-06
6 Claude Sonnet 4.5 88.56 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
7 GLM 4-Plus 88.20 Imported 2026-05-06
8 DeepSeek v3.2 88.19 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
9 Grok 4 88 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
10 MiniMax M2.1 87.42 MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-06
11 Grok 4.1 Fast 86.80 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-06
12 Gemini 3 Pro Preview 85.80 Gemini 3
google-gemini-3
Imported 2026-05-06
13 GLM-4.7 83.90 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
14 GLM 4.7 Flash 83.83 GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-06
15 Gemini 3 Flash 83.44 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06