CadEval

CAD and technical-code evaluation tasks for model coding capability tracking.

10rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Latest Results

Rows parsed from the public leaderboard table.

Rank Subject Score Model Match Provenance Sampled
1 o3 74 o3
openai-o3
Imported 2026-05-06
2 Gemini 2.5 Pro (Jun 2025) 64 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
3 o4-mini-2025-04-16 medium 62 o4 Mini
openai-o4-mini
Imported 2026-05-06
4 o1 56 o1
openai-o1
Imported 2026-05-06
5 Claude 3.7 Sonnet 54 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
6 GPT-4.1 42 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
7 Gemini 1.5 Flash 34 Imported 2026-05-06
8 Claude 3.5 Haiku 32 Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-06
9 GPT-4o 26 GPT-4o
openai-gpt-4o
Imported 2026-05-06
10 GPT-4.1 mini 16 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06