LiveSecBench

Dynamic live safety benchmark for large language models across ethics, legality, privacy, factuality, and psychological health.

43rows
overall_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Overall Score, Ethics, Legality, Privacy, Factuality, Psychological Health

Latest Results

Rows parsed from the LiveSecBench public CSV linked by the benchmark site. Scores cover dynamic safety categories including ethics, legality, privacy, factuality, and psychological health.

Rank Subject Overall Score Model Match Provenance Sampled
1 Claude-Haiku-4.5 91.43 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-27
2 Claude-Sonnet-4.6 85.97 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-27
3 GPT-5.2 84.72 GPT-5.2
openai-gpt-5.2
Imported 2026-05-27
4 Qwen3.5-Plus-2026-02-15 84.34 Qwen3.5 Plus 2026-02-15
qwen-qwen3.5-plus-02-15
Imported 2026-05-27
5 Qwen3.5-397B-A17B 81.52 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-27
6 Spark X2 79.18 Imported 2026-05-27
7 Kimi-K2.5 74.79 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-27
8 Doubao-Seed-1.6 70.83 Imported 2026-05-27
9 Qwen3-235B-A22B 69.23 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-27
10 Minimax-M2 66.69 MiniMax M2
minimax-minimax-m2
Imported 2026-05-27
11 GPT-OSS-120B 66.63 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-27
12 Intern-S1-Pro 63.63 Imported 2026-05-27
13 Doubao-Seed-2.0-Pro 63.04 Imported 2026-05-27
14 Minimax-M2.5 61.65 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-27
15 Gemini-3.1-Pro-Preview 58.16 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-27
16 MiMo-V2-Flash 57.23 MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-27
17 Longcat-Flash-Chat 57.1 Imported 2026-05-27
18 GLM-5 56.73 GLM GLM 5
z-ai-glm-5
Imported 2026-05-27
19 DeepSeek-V3.2 56.2 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-27
20 DeepSeek-R1-0528 55.22 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-27
21 Step3.5-Flash 52.1 S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-27
22 GLM-4.6 44.87 GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-27
23 Gemini-2.5-Flash 42.38 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
24 SenseChat-Turbo-1202 42.09 Imported 2026-05-27
25 Ling-2.5-1T 40.46 Imported 2026-05-27
26 Ernie-5.0-Preview-1022 40.05 Imported 2026-05-27
27 Step3 38.94 Imported 2026-05-27
28 Llama-3.3-70B-Instruct 38.89 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-27
29 Kimi-K2-0711 35.58 Imported 2026-05-27
30 Grok-4.1-Fast 32.73 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-27
31 Hunyuan-T1-20250822 32.64 Imported 2026-05-27
32 Intern-S1 31.46 Imported 2026-05-27
33 Seed-OSS-36B-Instruct 30.3 Imported 2026-05-27
34 Mistral-Large-2411 29.72 Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-05-27
35 Mistral-Large-3-2512 28.43 Imported 2026-05-27
36 Grok-3-Mini 28.37 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-27
37 Llama-4-Maverick 28.18 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-27
38 SenseNova-V6-5-Turbo 24.0 Imported 2026-05-27
39 Ernie-4.5-21B-A3B-Thinking 23.66 ERNIE 4.5 21B A3B Thinking
baidu-ernie-4.5-21b-a3b-thinking
Imported 2026-05-27
40 GPT-4.1-Mini 22.99 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-27
41 DeepSeek-V3-0324 18.08 DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-27
42 Hunyuan-A13B-Instruct 16.94 T Hunyuan A13B Instruct
tencent-hunyuan-a13b-instruct
Imported 2026-05-27
43 Pangu-Pro-MoE-72B-A16B 14.32 Imported 2026-05-27