QwenWebDev | BenchmarkList

Metadata

BT/Elo score

Rank	Subject	BT/Elo score	Model Match	Provenance	Sampled
1	Claude Opus 4.6 Max	1617	Claude Opus 4.6 anthropic-claude-opus-4.6	Self-reported	2026-05-28
2	DeepSeek V4 Pro Max	1570	DeepSeek V4 Pro deepseek-deepseek-v4-pro	Self-reported	2026-05-28
3	Qwen3.7 Max	1568	Qwen3.7 Max qwen-qwen3.7-max	Self-reported	2026-05-28
4	GLM-5.1 Thinking	1564	GLM GLM 5.1 z-ai-glm-5.1	Self-reported	2026-05-28
5	Qwen3.6 Plus	1500	Qwen3.6 Plus qwen-qwen3.6-plus	Self-reported	2026-05-28