APEX-Agents
The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.
40rows
mean_score_reactprimary metric
2026-05-06sampled
Metadata
Metrics
Mean Score (ReAct), Pass@1 (ReAct), Mean Score (Loop), Pass@1 (Loop)
| Rank | Subject | Mean Score (ReAct) | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT 5.5 (xHigh) | 53.90 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 2 | GPT 5.4 (xHigh) | 52.70 | — | Imported | 2026-05-06 |
| 3 | Opus 4.7 (Max) | 50.60 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 4 | Opus 4.6 (Max) | 48.40 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 5 | GPT 5.2 (xHigh) | 48.40 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 6 | Gemini 3.1 Pro (High) | 48.20 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 7 | GPT 5.3 Codex (High) | 46.90 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 8 | Opus 4.6 (High) | 45.60 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 9 | GPT 5.2 Codex (High) | 42.20 | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-06 |
| 10 | Sonnet 4.6 (High) | 40.70 | — | Imported | 2026-05-06 |
| 11 | Gemini 3 Flash (High) | 39.50 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 12 | GPT 5.2 (High) | 38.70 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 13 | GPT 5.4 mini (xHigh) | 37.50 | — | Imported | 2026-05-06 |
| 14 | GPT 5.1 Codex (High) | 34.90 | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-06 |
| 15 | Opus 4.5 (High) | 34.80 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 16 | GPT 5 Codex (High) | 34.80 | GPT-5 Codex openai-gpt-5-codex | Imported | 2026-05-06 |
| 17 | Gemini 3 Pro (High) | 34.10 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 18 | GPT 5 (High) | 33 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 19 | GPT 5.1 (High) | 31.50 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 20 | o3 (High) | 31.40 | — | Imported | 2026-05-06 |
| 21 | GLM 5 (Thinking) | 30.80 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 22 | Grok 4 | 30.30 | Grok 4 x-ai-grok-4 | Imported | 2026-05-06 |
| 23 | Kimi K2.5 | 29.20 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 24 | Qwen 3.5 (Thinking) | 27.70 | — | Imported | 2026-05-06 |
| 25 | GPT 5.4 nano (xHigh) | 25.50 | — | Imported | 2026-05-06 |
| 26 | Gemini 3.1 Flash Lite (High) | 25 | — | Imported | 2026-05-06 |
| 27 | Grok 4.1 (Fast) | 24.80 | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-06 |
| 28 | Sonnet 4 | 23 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-06 |
| 29 | Claude Haiku 4.5 (High) | 21.40 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 30 | DeepSeek v3.2 | 18.80 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 31 | Minimax-2.5 | 18.70 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-06 |
| 32 | Gemini 2.5 Pro (On) | 17 | — | Imported | 2026-05-06 |
| 33 | GPT OSS 120B (High) | 14.50 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 34 | GLM 4.6 | 11.80 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-06 |
| 35 | Kimi K2 Thinking | 11.50 | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Imported | 2026-05-06 |
| 36 | GLM 4.7 | 8.40 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 37 | Grok 3 | 7.30 | Grok 3 xaigrok-3 | Imported | 2026-05-06 |
| 38 | Gemini 2.5 Flash (On) | 6.40 | — | Imported | 2026-05-06 |
| 39 | o1 (High) | 5.50 | — | Imported | 2026-05-06 |
| 40 | GPT 4o | 5.40 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
No matching rows.