AdvBench | BenchmarkList

Metadata

AdvBench harmful-instruction score, Task coverage

Rank	Subject	AdvBench harmful-instruction score	Model Match	Provenance	Sampled
1	meta-llama/Llama-2-7b-chat-hf	0.999099	—	Imported	2026-05-27
2	meta-llama/Llama-2-13b-chat-hf	0.99903	—	Imported	2026-05-27
3	meta-llama/Llama-2-70b-chat-hf	0.998804	—	Imported	2026-05-27
4	Qwen/Qwen1.5-72B-Chat	0.99769	—	Imported	2026-05-27
5	gpt-4-1106-preview	0.996174	—	Imported	2026-05-27
6	mistralai/Mistral-7B-Instruct-v0.3	0.99597	—	Imported	2026-05-27
7	Claude3Opus	0.993269	—	Imported	2026-05-27
8	mistralai/Mixtral-8x7B-Instruct-v0.1	0.993204	Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct	Imported	2026-05-27
9	01-ai/Yi-34B-Chat	0.99276	—	Imported	2026-05-27
10	mistralai/Mistral-7B-Instruct-v0.2	0.992403	—	Imported	2026-05-27
11	gemini-1.5-flash-001	0.991324	—	Imported	2026-05-27
12	gpt-3.5-turbo-0125	0.990525	—	Imported	2026-05-27
13	speakleash/Bielik-11B-v2.3-Instruct	0.990427	—	Imported	2026-05-27
14	mistralai/Mistral-7B-v0.3	0.975977	—	Imported	2026-05-27
15	google/gemma-2-9b	0.925444	—	Imported	2026-05-27