AutoLab | BenchmarkList

Metadata

Overall Score, Model Development Score, Puzzle and Challenge Score, System Optimization Score, Tasks

Rank	Subject	Overall Score	Model Match	Provenance	Sampled
1	Claude Opus 4.6	0.85	Claude Opus 4.6 anthropic-claude-opus-4.6	Imported	2026-05-06
2	Gemini 3.1 Pro	0.71	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-06
3	MiMo V2 Pro	0.64	MiMo-V2-Pro xiaomi-mimo-v2-pro	Imported	2026-05-06
4	GLM-5	0.60	GLM GLM 5 z-ai-glm-5	Imported	2026-05-06
5	GPT-5.4	0.56	GPT-5.4 openai-gpt-5.4	Imported	2026-05-06
6	Kimi K2.5	0.55	KIMI MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5	Imported	2026-05-06
7	Qwen 3.5 Plus	0.54	Qwen3.5 Plus 2026-04-20 qwen-qwen3.5-plus-20260420	Imported	2026-05-06