Vals Multimodal Index

Benchmark consisting of a weighted performance across finance, coding, and education tasks. Showing the potential impact that LLM's can have on the economy.

16rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.8 70.712% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
2 GPT 5.5 67.768% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
3 Claude Opus 4.7 67.361% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
4 Gemini 3.5 Flash 62.291% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
5 Claude Sonnet 4.6 60.783% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28
6 Kimi K2.6 Thinking 56.788% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
7 Gemini 3.1 Pro Preview 55.749% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
8 GPT 5.4 Mini 2026-03-17 53.298% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-28
9 Gemini 3 Flash Preview 51.975% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
10 Qwen 3.6 Plus 50.737% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
11 GPT 5.4 Nano 2026-03-17 47.484% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
12 Grok 4.3 43.435% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
13 Claude Haiku 4.5 20251001 Thinking 42.352% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
14 Gemini 3.1 Flash Lite Preview 40.466% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
15 Grok 4.20 0309 Reasoning 38.704% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
16 Command A Plus 05 2026 27.186% Imported 2026-05-28