XSTest

XSTest is a test suite designed to identify exaggerated safety behaviours in large language models. It comprises 450 prompts: 250 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with, and 200 unsafe prompts as contrasts that models should refuse. The benchmark systematically evaluates whether models refuse to respond to clearly safe prompts due to overly cautious safety mechanisms.

3rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Gemini 1.5 Pro 0.99 Self-reported 2026-05-06
2 Gemini 1.5 Flash 0.97 Self-reported 2026-05-06
3 Gemini 1.5 Flash 8B 0.93 Self-reported 2026-05-06