GDPval-AA

GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.

13rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Normalized Score

Showing 2 latest source slices.

Latest Results

Provider-published system-card benchmark scores parsed from Anthropic's Claude Opus 4.8 capability evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.8 1890 Elo Claude Opus 4.8
anthropic-claude-opus-4.8
Self-reported 2026-05-28
2 GPT-5.5 1769 Elo GPT-5.5
openai-gpt-5.5
Self-reported 2026-05-28
3 Claude Opus 4.7 1753 Elo Claude Opus 4.7
anthropic-claude-opus-4.7
Self-reported 2026-05-28
4 Gemini 3.1 Pro Preview 1314 Elo Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Self-reported 2026-05-28
1 Claude Sonnet 4.6 1633 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Self-reported 2026-05-06
2 Claude Opus 4.6 1606 Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-06
3 DeepSeek-V4-Pro-Max 1554 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-06
4 MiniMax M2.7 1494 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-06
5 Muse Spark 1444 Self-reported 2026-05-06
6 MiMo-V2-Pro 1426 MiMo-V2-Pro
xiaomi-mimo-v2-pro
Self-reported 2026-05-06
7 MiMo-V2-Omni 1410 MiMo-V2-Omni
xiaomi-mimo-v2-omni
Self-reported 2026-05-06
8 DeepSeek-V4-Flash-Max 1395 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Self-reported 2026-05-06
9 Gemini 3.1 Pro 1317 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Self-reported 2026-05-06