AgentHarm

AgentHarm: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.

39rows
harm_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Harm score (lower is better), Refusal rate, Non-refusal harm score (lower is better), Benign non-refusal score

Latest Results

Rows are parsed from the AgentHarm paper arXiv LaTeX table for the public test set. Harm score is the primary lower-is-better safety metric; refusal rate, non-refusal harm score, and benign non-refusal score are preserved.

Rank Subject Harm score Model Match Provenance Sampled
1 Llama-3.1 8B (None) 3.1% Imported 2026-05-27
2 Llama-3.1 405B (None) 4.3% Imported 2026-05-27
3 Llama-3.1 405B (Template) 4.3% Imported 2026-05-27
4 Claude 3 Haiku (Template) 6.6% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-27
5 Gemini 1.0 Pro (None) 7.4% Imported 2026-05-27
6 Claude 3 Haiku (None) 11.1% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-27
7 Claude 3.5 Sonnet (None) 13.5% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
8 Llama-3.1 70B (None) 14.0% Imported 2026-05-27
9 Claude 3 Opus (None) 14.4% Imported 2026-05-27
10 Llama-3.1 70B (Template) 15.0% Imported 2026-05-27
11 Gemini 1.5 Pro (None) 15.7% Imported 2026-05-27
12 Claude 3 Sonnet (None) 20.7% Imported 2026-05-27
13 Gemini 1.5 Flash (None) 20.7% Imported 2026-05-27
14 Gemini 1.0 Pro (Template) 23.3% Imported 2026-05-27
15 Claude 3.5 Sonnet (Forced tool call) 26.9% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
16 Llama-3.1 8B (Template) 27.5% Imported 2026-05-27
17 Claude 3 Opus (Forced tool call) 29.5% Imported 2026-05-27
18 Claude 3 Haiku (Forced tool call) 33.9% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-27
19 Claude 3 Sonnet (Forced tool call) 42.8% Imported 2026-05-27
20 Claude 3 Opus (Template) 45.7% Imported 2026-05-27
21 GPT-4o (None) 48.4% GPT-4o
openai-gpt-4o
Imported 2026-05-27
22 Claude 3 Sonnet (Template) 52.8% Imported 2026-05-27
23 Gemini 1.5 Pro (Template) 56.1% Imported 2026-05-27
24 Gemini 1.5 Flash (Template) 56.6% Imported 2026-05-27
25 GPT-4o (Forced tool call) 57.7% GPT-4o
openai-gpt-4o
Imported 2026-05-27
26 GPT-3.5 Turbo (Template) 62.0% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
27 GPT-3.5 Turbo (None) 62.2% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
28 GPT-4o mini (None) 62.5% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
29 GPT-3.5 Turbo (Forced tool call) 63.2% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
30 GPT-4o mini (Forced tool call) 68.4% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
31 Claude 3.5 Sonnet (Template) 68.7% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
32 GPT-4o mini (Template) 68.8% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
33 Mistral Small 2 (None) 72.0% Imported 2026-05-27
34 GPT-4o (Template) 72.7% GPT-4o
openai-gpt-4o
Imported 2026-05-27
35 Mistral Small 2 (Template) 72.7% Imported 2026-05-27
36 Mistral Small 2 (Forced tool call) 73.7% Imported 2026-05-27
37 Mistral Large 2 (Template) 80.5% Imported 2026-05-27
38 Mistral Large 2 (Forced tool call) 80.9% Imported 2026-05-27
39 Mistral Large 2 (None) 82.2% Imported 2026-05-27