Qwen3.5-27B | BenchmarkList

Metadata

Qwen Open source

Aliases: qwen-qwen3.5-27b, qwen-qwen3.5-27b-20260224, qwen/qwen3.5-27b, qwen/qwen3.5-27b-20260224, qwen3.5-27b, qwen3.5-27b-20260224

Benchmark	Category	Rank	Score	Sampled
DeepPlanning	Agentic	6	0.23	2026-05-06
Gert Labs Rankings	Agentic	38	0.44	2026-05-11
ITBench-AA	Agentic	10	35.5%	2026-05-28
OSWorld-Verified	Agentic	10	0.56	2026-05-06
PinchBench	Agentic	4	0.90	2026-05-06
ScreenSpot-Pro	Agentic	5	70.30	2026-05-06
t2-bench	Agentic	15	0.79	2026-05-06
Tau2-Bench Telecom	Agentic	33	93.9%	2026-05-11
Tau2-Bench Telecom	Agentic	72	87.1%	2026-05-11
Terminal-Bench Hard	Agentic	81	32.6%	2026-05-11
Terminal-Bench Hard	Agentic	88	31.8%	2026-05-11
TIR-Bench	Agentic	2	0.60	2026-05-06
Vending-Bench 2	Agentic	33	201.98	2026-05-28
OpenUGI	Alignment	1029	22.32	2026-05-06
OpenUGI	Alignment	1139	15.98	2026-05-06
ALE-Bench	Coding	75	349.45	2026-05-06
Arena AI Code	Coding	47	1352	2026-05-06
Codeforces	Coding	7	0.807	2026-05-28
FullStackBench en	Coding	2	0.60	2026-05-06
FullStackBench zh	Coding	2	0.57	2026-05-06
SciCode	Coding	111	39.5%	2026-05-11
SciCode	Coding	161	36.7%	2026-05-11
OmniDocBench 1.5	Document Understanding	6	0.89	2026-05-06
EmbSpatialBench	Embodied	2	0.84	2026-05-06
Vectara HHEM Hallucination Leaderboard	Factuality	77	87.90	2026-05-06
ALL Bench LLM	General Knowledge	32	20.96	2026-05-06
BenchLM	General Knowledge	45	63	2026-05-06
MAXIFE	General Knowledge	3	0.88	2026-05-06
MMLU-ProX	General Knowledge	3	0.82	2026-05-06
MMLU-Redux	General Knowledge	11	0.93	2026-05-06
NOVA-63	General Knowledge	3	0.58	2026-05-06
MedXpertQA	Healthcare	3	0.62	2026-05-06
PMC-VQA	Healthcare	2	0.62	2026-05-06
SlakeVQA	Healthcare	2	0.80	2026-05-06
IFBench	Instruction Following	2	0.77	2026-05-06
Artificial Analysis Intelligence Index	Intelligence	59	42.07	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	94	37.18	2026-05-11
Humanity's Last Exam	Intelligence	62	22.2%	2026-05-11
Humanity's Last Exam	Intelligence	110	13.2%	2026-05-11
MathVision	Intelligence	10	86	2026-05-06
AA-LCR	Long Context	6	0.66	2026-05-06
AIME 2026	Mathematics	8	90.83	2026-05-06
DynaMath	Mathematics	2	0.88	2026-05-06
HMMT 2025	Mathematics	13	0.92	2026-05-06
HMMT February 2026	Mathematics	8	81.06	2026-05-06
PolyMATH	Mathematics	3	0.71	2026-05-06
ALL Bench Multimodal	Multimodal	26	25.86	2026-05-06
BabyVision	Multimodal	2	0.45	2026-05-06
CC-OCR	Multimodal	7	0.81	2026-05-06
CharXiv-R	Multimodal	10	0.80	2026-05-06
LingoQA	Multimodal	1	0.82	2026-05-06
LVBench	Multimodal	3	0.74	2026-05-06
MLVU	Multimodal	5	0.86	2026-05-06
MMVU	Multimodal	3	0.73	2026-05-06
Nuscene	Multimodal	2	0.15	2026-05-06
SimpleVQA	Multimodal	10	0.56	2026-05-06
VideoMME w sub.	Multimodal	3	0.87	2026-05-06
VideoMME w/o sub.	Multimodal	2	0.83	2026-05-06
VideoMMMU	Multimodal	12	0.82	2026-05-06
VLMsAreBlind	Multimodal	3	0.97	2026-05-06
ZEROBench	Multimodal	3	0.10	2026-05-06
ZEROBench-Sub	Multimodal	1	0.36	2026-05-06
Artificial Analysis Openness Index	Openness	137	38.89	2026-05-11
Artificial Analysis Openness Index	Openness	138	38.89	2026-05-11
Altered Riddles	Reasoning	12	0.4234	2026-05-27
ERQA	Reasoning	8	0.60	2026-05-06
Global PIQA	Reasoning	6	0.88	2026-05-06
GPQA Diamond	Reasoning	46	85.8%	2026-05-11
GPQA Diamond	Reasoning	63	84.2%	2026-05-11
OJBench	Reasoning	3	0.40	2026-05-06
CritPt	Science	100	0.9%	2026-05-11
CritPt	Science	150	0.3%	2026-05-11
BrowseComp-zh	Search	9	0.62	2026-05-06
Seal-0	Search	3	0.47	2026-05-06
WideSearch	Search	5	0.61	2026-05-06
CountBench	Spatial Reasoning	1	0.98	2026-05-06
Hypersim	Spatial Reasoning	2	0.13	2026-05-06
RefCOCO-avg	Spatial Reasoning	6	0.91	2026-05-06
RefSpatialBench	Spatial Reasoning	4	0.68	2026-05-06
SUNRGBD	Spatial Reasoning	2	0.35	2026-05-06
BFCL-V4	Tool Use	3	0.69	2026-05-06
WMT24++	Translation	6	0.78	2026-05-06
ODinW	Vision	13	0.41	2026-05-06
K-MetBench	Weather	4	83.0% accuracy	2026-05-28
K-MetBench	Weather	17	73.4% accuracy	2026-05-28

Metadata

Benchmark Results