EvasionBench
Benchmark for evaluating model robustness against evasion-style safety and policy-circumvention prompts.
5rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | zai-org/GLM-4.7 | 82.91 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 2 | Qwen/Qwen3-Coder-480B-A35B-Instruct | 78.16 | Qwen3 Coder 480B A35B qwen-qwen3-coder | Imported | 2026-05-06 |
| 3 | MiniMaxAI/MiniMax-M2.1 | 71.31 | — | Imported | 2026-05-06 |
| 4 | deepseek-ai/DeepSeek-V3.2 | 66.88 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 5 | moonshotai/Kimi-K2-Instruct-0905 | 66.68 | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-06 |
No matching rows.