LiveSecBench | BenchmarkList

Metadata

ID: livesecbench
Category: Safety
Release: Unknown
Source: Source page
Snapshot: Snapshot source

Metrics

Overall Score, Ethics, Legality, Privacy, Factuality, Psychological Health

Rank	Subject	Overall Score	Model Match	Provenance	Sampled
1	Claude-Haiku-4.5	91.43	Claude Haiku 4.5 anthropic-claude-haiku-4.5	Imported	2026-05-27
2	Claude-Sonnet-4.6	85.97	Claude Sonnet 4.6 anthropic-claude-sonnet-4.6	Imported	2026-05-27
3	GPT-5.2	84.72	GPT-5.2 openai-gpt-5.2	Imported	2026-05-27
4	Qwen3.5-Plus-2026-02-15	84.34	Qwen3.5 Plus 2026-02-15 qwen-qwen3.5-plus-02-15	Imported	2026-05-27
5	Qwen3.5-397B-A17B	81.52	Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b	Imported	2026-05-27
6	Spark X2	79.18	—	Imported	2026-05-27
7	Kimi-K2.5	74.79	KIMI MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5	Imported	2026-05-27
8	Doubao-Seed-1.6	70.83	—	Imported	2026-05-27
9	Qwen3-235B-A22B	69.23	Qwen3 235B A22B qwen-qwen3-235b-a22b	Imported	2026-05-27
10	Minimax-M2	66.69	MiniMax M2 minimax-minimax-m2	Imported	2026-05-27
11	GPT-OSS-120B	66.63	gpt-oss-120b openai-gpt-oss-120b	Imported	2026-05-27
12	Intern-S1-Pro	63.63	—	Imported	2026-05-27
13	Doubao-Seed-2.0-Pro	63.04	—	Imported	2026-05-27
14	Minimax-M2.5	61.65	MiniMax M2.5 minimax-minimax-m2.5	Imported	2026-05-27
15	Gemini-3.1-Pro-Preview	58.16	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-27
16	MiMo-V2-Flash	57.23	MiMo-V2-Flash xiaomi-mimo-v2-flash	Imported	2026-05-27
17	Longcat-Flash-Chat	57.1	—	Imported	2026-05-27
18	GLM-5	56.73	GLM GLM 5 z-ai-glm-5	Imported	2026-05-27
19	DeepSeek-V3.2	56.2	DeepSeek V3.2 deepseek-deepseek-v3.2	Imported	2026-05-27
20	DeepSeek-R1-0528	55.22	R1 0528 deepseek-deepseek-r1-0528	Imported	2026-05-27
21	Step3.5-Flash	52.1	S Step 3.5 Flash stepfun-step-3.5-flash	Imported	2026-05-27
22	GLM-4.6	44.87	GLM GLM 4.6 z-ai-glm-4.6	Imported	2026-05-27
23	Gemini-2.5-Flash	42.38	Gemini 2.5 Flash google-gemini-2.5-flash	Imported	2026-05-27
24	SenseChat-Turbo-1202	42.09	—	Imported	2026-05-27
25	Ling-2.5-1T	40.46	—	Imported	2026-05-27
26	Ernie-5.0-Preview-1022	40.05	—	Imported	2026-05-27
27	Step3	38.94	—	Imported	2026-05-27
28	Llama-3.3-70B-Instruct	38.89	Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct	Imported	2026-05-27
29	Kimi-K2-0711	35.58	—	Imported	2026-05-27
30	Grok-4.1-Fast	32.73	GROK Grok 4.1 Fast x-ai-grok-4.1-fast	Imported	2026-05-27
31	Hunyuan-T1-20250822	32.64	—	Imported	2026-05-27
32	Intern-S1	31.46	—	Imported	2026-05-27
33	Seed-OSS-36B-Instruct	30.3	—	Imported	2026-05-27
34	Mistral-Large-2411	29.72	Mistral Large 2411 mistralai-mistral-large-2411	Imported	2026-05-27
35	Mistral-Large-3-2512	28.43	—	Imported	2026-05-27
36	Grok-3-Mini	28.37	GROK Grok 3 Mini x-ai-grok-3-mini	Imported	2026-05-27
37	Llama-4-Maverick	28.18	Llama 4 Maverick meta-llama-4-maverick	Imported	2026-05-27
38	SenseNova-V6-5-Turbo	24.0	—	Imported	2026-05-27
39	Ernie-4.5-21B-A3B-Thinking	23.66	ERNIE 4.5 21B A3B Thinking baidu-ernie-4.5-21b-a3b-thinking	Imported	2026-05-27
40	GPT-4.1-Mini	22.99	GPT-4.1 Mini openai-gpt-4.1-mini	Imported	2026-05-27
41	DeepSeek-V3-0324	18.08	DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324	Imported	2026-05-27
42	Hunyuan-A13B-Instruct	16.94	T Hunyuan A13B Instruct tencent-hunyuan-a13b-instruct	Imported	2026-05-27
43	Pangu-Pro-MoE-72B-A16B	14.32	—	Imported	2026-05-27

Metadata

Metrics

Latest Results