AdvBench

AdvBench: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.

15rows
advbench_refusal_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

AdvBench harmful-instruction score, Task coverage

Latest Results

Rows are imported from public COMPL-AI Hugging Face Space per-model JSON files for COMPL-AI task Following Harmful Instructions: AdvBench.

Rank Subject AdvBench harmful-instruction score Model Match Provenance Sampled
1 meta-llama/Llama-2-7b-chat-hf 0.999099 Imported 2026-05-27
2 meta-llama/Llama-2-13b-chat-hf 0.99903 Imported 2026-05-27
3 meta-llama/Llama-2-70b-chat-hf 0.998804 Imported 2026-05-27
4 Qwen/Qwen1.5-72B-Chat 0.99769 Imported 2026-05-27
5 gpt-4-1106-preview 0.996174 Imported 2026-05-27
6 mistralai/Mistral-7B-Instruct-v0.3 0.99597 Imported 2026-05-27
7 Claude3Opus 0.993269 Imported 2026-05-27
8 mistralai/Mixtral-8x7B-Instruct-v0.1 0.993204 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-27
9 01-ai/Yi-34B-Chat 0.99276 Imported 2026-05-27
10 mistralai/Mistral-7B-Instruct-v0.2 0.992403 Imported 2026-05-27
11 gemini-1.5-flash-001 0.991324 Imported 2026-05-27
12 gpt-3.5-turbo-0125 0.990525 Imported 2026-05-27
13 speakleash/Bielik-11B-v2.3-Instruct 0.990427 Imported 2026-05-27
14 mistralai/Mistral-7B-v0.3 0.975977 Imported 2026-05-27
15 google/gemma-2-9b 0.925444 Imported 2026-05-27