APEX-Agents

The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services.

40rows
mean_score_reactprimary metric
2026-05-06sampled

Metadata

Metrics

Mean Score (ReAct), Pass@1 (ReAct), Mean Score (Loop), Pass@1 (Loop)

Latest Results

Rows ranked by Mean Score (ReAct). Loop harness scores are included when the public payload provides them.

Rank Subject Mean Score (ReAct) Model Match Provenance Sampled
1 GPT 5.5 (xHigh) 53.90 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
2 GPT 5.4 (xHigh) 52.70 Imported 2026-05-06
3 Opus 4.7 (Max) 50.60 Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-06
4 Opus 4.6 (Max) 48.40 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
5 GPT 5.2 (xHigh) 48.40 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
6 Gemini 3.1 Pro (High) 48.20 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
7 GPT 5.3 Codex (High) 46.90 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-06
8 Opus 4.6 (High) 45.60 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
9 GPT 5.2 Codex (High) 42.20 GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-06
10 Sonnet 4.6 (High) 40.70 Imported 2026-05-06
11 Gemini 3 Flash (High) 39.50 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06
12 GPT 5.2 (High) 38.70 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
13 GPT 5.4 mini (xHigh) 37.50 Imported 2026-05-06
14 GPT 5.1 Codex (High) 34.90 GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-06
15 Opus 4.5 (High) 34.80 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
16 GPT 5 Codex (High) 34.80 GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-06
17 Gemini 3 Pro (High) 34.10 Gemini 3
google-gemini-3
Imported 2026-05-06
18 GPT 5 (High) 33 GPT-5
openai-gpt-5
Imported 2026-05-06
19 GPT 5.1 (High) 31.50 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
20 o3 (High) 31.40 Imported 2026-05-06
21 GLM 5 (Thinking) 30.80 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
22 Grok 4 30.30 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
23 Kimi K2.5 29.20 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
24 Qwen 3.5 (Thinking) 27.70 Imported 2026-05-06
25 GPT 5.4 nano (xHigh) 25.50 Imported 2026-05-06
26 Gemini 3.1 Flash Lite (High) 25 Imported 2026-05-06
27 Grok 4.1 (Fast) 24.80 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-06
28 Sonnet 4 23 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-06
29 Claude Haiku 4.5 (High) 21.40 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
30 DeepSeek v3.2 18.80 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
31 Minimax-2.5 18.70 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-06
32 Gemini 2.5 Pro (On) 17 Imported 2026-05-06
33 GPT OSS 120B (High) 14.50 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
34 GLM 4.6 11.80 GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-06
35 Kimi K2 Thinking 11.50 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
36 GLM 4.7 8.40 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
37 Grok 3 7.30 GROK Grok 3
xaigrok-3
Imported 2026-05-06
38 Gemini 2.5 Flash (On) 6.40 Imported 2026-05-06
39 o1 (High) 5.50 Imported 2026-05-06
40 GPT 4o 5.40 GPT-4o
openai-gpt-4o
Imported 2026-05-06