Vals Index

Benchmark consisting of a weighted performance across finance and coding tasks. Showing the potential impact that LLM's can have on the economy.

20rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.8 70.166% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
2 GPT 5.5 67.622% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
3 Claude Opus 4.7 66.099% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
4 Gemini 3.5 Flash 62.054% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
5 Claude Sonnet 4.6 60.296% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28
6 Qwen 3.7 Max 57.294% Qwen3.7 Max
qwen-qwen3.7-max
Imported 2026-05-28
7 DeepSeek V4 Pro 56.231% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-28
8 Kimi K2.6 Thinking 55.551% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
9 Gemini 3.1 Pro Preview 53.423% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
10 GLM 5.1 Thinking 52.144% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-28
11 GPT 5.4 Mini 2026-03-17 51.422% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-28
12 Gemini 3 Flash Preview 49.314% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
13 Qwen 3.6 Plus 48.039% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
14 Grok 4.3 46.635% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
15 GPT 5.4 Nano 2026-03-17 46.461% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
16 MiniMax M2.7 41.406% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-28
17 Claude Haiku 4.5 20251001 Thinking 40.325% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
18 Grok 4.20 0309 Reasoning 39.11% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
19 Gemini 3.1 Flash Lite Preview 35.236% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
20 Command A Plus 05 2026 24.611% Imported 2026-05-28