EvasionBench

Benchmark for evaluating model robustness against evasion-style safety and policy-circumvention prompts.

5rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score

Latest Results

Rows are ranked by the benchmark score column. Model display names are preserved from the OpenEvals source dataset.

Rank Subject Score Model Match Provenance Sampled
1 zai-org/GLM-4.7 82.91 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
2 Qwen/Qwen3-Coder-480B-A35B-Instruct 78.16 Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-06
3 MiniMaxAI/MiniMax-M2.1 71.31 Imported 2026-05-06
4 deepseek-ai/DeepSeek-V3.2 66.88 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
5 moonshotai/Kimi-K2-Instruct-0905 66.68 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-06