GDPval-AA
GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.
13rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Normalized Score
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 1890 Elo | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 2 | GPT-5.5 | 1769 Elo | GPT-5.5 openai-gpt-5.5 | Self-reported | 2026-05-28 |
| 3 | Claude Opus 4.7 | 1753 Elo | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 4 | Gemini 3.1 Pro Preview | 1314 Elo | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Self-reported | 2026-05-28 |
| 1 | Claude Sonnet 4.6 | 1633 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Self-reported | 2026-05-06 |
| 2 | Claude Opus 4.6 | 1606 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-06 |
| 3 | DeepSeek-V4-Pro-Max | 1554 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-06 |
| 4 | MiniMax M2.7 | 1494 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 5 | Muse Spark | 1444 | — | Self-reported | 2026-05-06 |
| 6 | MiMo-V2-Pro | 1426 | MiMo-V2-Pro xiaomi-mimo-v2-pro | Self-reported | 2026-05-06 |
| 7 | MiMo-V2-Omni | 1410 | MiMo-V2-Omni xiaomi-mimo-v2-omni | Self-reported | 2026-05-06 |
| 8 | DeepSeek-V4-Flash-Max | 1395 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Self-reported | 2026-05-06 |
| 9 | Gemini 3.1 Pro | 1317 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Self-reported | 2026-05-06 |
No matching rows.