APEX-Agents-AA | BenchmarkList

Metadata

ID: apex_agents_aa
Category: Agentic
Release: 2026-01-21
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Pass@1

Rank	Subject	Pass@1	Model Match	Provenance	Sampled
1	GPT-5.5 (xhigh)	37.7%	GPT-5.5 openai-gpt-5.5	Imported	2026-05-11
2	GPT-5.4 (xhigh)	33.3%	GPT-5.4 openai-gpt-5.4	Imported	2026-05-11
3	Claude Opus 4.6 (Adaptive Reasoning, Max Effort)	33%	Claude Opus 4.6 anthropic-claude-opus-4.6	Imported	2026-05-11
4	Gemini 3.1 Pro Preview	32%	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-11
5	GPT-5.4 mini (xhigh)	28.2%	GPT-5.4 Mini openai-gpt-5.4-mini	Imported	2026-05-11
6	Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)	28%	Claude Sonnet 4.6 anthropic-claude-sonnet-4.6	Imported	2026-05-11
7	Gemini 3 Flash Preview (Reasoning)	27.7%	Gemini 3 Flash Preview google-gemini-3-flash-preview	Imported	2026-05-11
8	GPT-5.4 nano (xhigh)	24.9%	GPT-5.4 Nano openai-gpt-5.4-nano	Imported	2026-05-11
9	Qwen3.5 397B A17B (Reasoning)	15.3%	Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b	Imported	2026-05-11
10	DeepSeek V3.2 (Reasoning)	14.5%	DeepSeek V3.2 deepseek-deepseek-v3.2	Imported	2026-05-11
11	GLM-5 (Reasoning)	14.5%	GLM GLM 5 z-ai-glm-5	Imported	2026-05-11
12	Grok 4.20 0309 (Reasoning)	14.2%	GROK Grok 4.20 x-ai-grok-4.20	Imported	2026-05-11
13	Gemini 3.1 Flash-Lite Preview	12.2%	Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview	Imported	2026-05-11
14	Kimi K2.5 (Reasoning)	11.5%	KIMI MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5	Imported	2026-05-11
15	MiniMax-M2.7	10.6%	MiniMax M2.7 minimax-minimax-m2.7	Imported	2026-05-11
16	gpt-oss-120B (high)	3.1%	gpt-oss-120b openai-gpt-oss-120b	Imported	2026-05-11
17	NVIDIA Nemotron 3 Super 120B A12B (Reasoning)	1.8%	Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b	Imported	2026-05-11
18	gpt-oss-20B (high)	0.7%	gpt-oss-20b openai-gpt-oss-20b	Imported	2026-05-11