APEX-Agents-AA

Artificial Analysis implementation of APEX-Agents using the Stirrup agent harness for long-horizon, cross-application professional-services tasks.

18rows
scoreprimary metric
2026-05-11sampled

Metadata

Metrics

Pass@1

Latest Results

Rows are parsed from the public Artificial Analysis Next.js RSC defaultData payload and ranked by the configured primary metric.

Rank Subject Pass@1 Model Match Provenance Sampled
1 GPT-5.5 (xhigh) 37.7% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
2 GPT-5.4 (xhigh) 33.3% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
3 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 33% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
4 Gemini 3.1 Pro Preview 32% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
5 GPT-5.4 mini (xhigh) 28.2% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
6 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 28% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
7 Gemini 3 Flash Preview (Reasoning) 27.7% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
8 GPT-5.4 nano (xhigh) 24.9% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
9 Qwen3.5 397B A17B (Reasoning) 15.3% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
10 DeepSeek V3.2 (Reasoning) 14.5% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
11 GLM-5 (Reasoning) 14.5% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
12 Grok 4.20 0309 (Reasoning) 14.2% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
13 Gemini 3.1 Flash-Lite Preview 12.2% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-11
14 Kimi K2.5 (Reasoning) 11.5% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
15 MiniMax-M2.7 10.6% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
16 gpt-oss-120B (high) 3.1% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
17 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) 1.8% Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
18 gpt-oss-20B (high) 0.7% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11