LiveSecBench
Dynamic live safety benchmark for large language models across ethics, legality, privacy, factuality, and psychological health.
43rows
overall_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Overall Score, Ethics, Legality, Privacy, Factuality, Psychological Health
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude-Haiku-4.5 | 91.43 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-27 |
| 2 | Claude-Sonnet-4.6 | 85.97 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-27 |
| 3 | GPT-5.2 | 84.72 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-27 |
| 4 | Qwen3.5-Plus-2026-02-15 | 84.34 | Qwen3.5 Plus 2026-02-15 qwen-qwen3.5-plus-02-15 | Imported | 2026-05-27 |
| 5 | Qwen3.5-397B-A17B | 81.52 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-27 |
| 6 | Spark X2 | 79.18 | — | Imported | 2026-05-27 |
| 7 | Kimi-K2.5 | 74.79 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-27 |
| 8 | Doubao-Seed-1.6 | 70.83 | — | Imported | 2026-05-27 |
| 9 | Qwen3-235B-A22B | 69.23 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-27 |
| 10 | Minimax-M2 | 66.69 | MiniMax M2 minimax-minimax-m2 | Imported | 2026-05-27 |
| 11 | GPT-OSS-120B | 66.63 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-27 |
| 12 | Intern-S1-Pro | 63.63 | — | Imported | 2026-05-27 |
| 13 | Doubao-Seed-2.0-Pro | 63.04 | — | Imported | 2026-05-27 |
| 14 | Minimax-M2.5 | 61.65 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-27 |
| 15 | Gemini-3.1-Pro-Preview | 58.16 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-27 |
| 16 | MiMo-V2-Flash | 57.23 | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-27 |
| 17 | Longcat-Flash-Chat | 57.1 | — | Imported | 2026-05-27 |
| 18 | GLM-5 | 56.73 | GLM 5 z-ai-glm-5 | Imported | 2026-05-27 |
| 19 | DeepSeek-V3.2 | 56.2 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-27 |
| 20 | DeepSeek-R1-0528 | 55.22 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-27 |
| 21 | Step3.5-Flash | 52.1 | Step 3.5 Flash stepfun-step-3.5-flash | Imported | 2026-05-27 |
| 22 | GLM-4.6 | 44.87 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-27 |
| 23 | Gemini-2.5-Flash | 42.38 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-27 |
| 24 | SenseChat-Turbo-1202 | 42.09 | — | Imported | 2026-05-27 |
| 25 | Ling-2.5-1T | 40.46 | — | Imported | 2026-05-27 |
| 26 | Ernie-5.0-Preview-1022 | 40.05 | — | Imported | 2026-05-27 |
| 27 | Step3 | 38.94 | — | Imported | 2026-05-27 |
| 28 | Llama-3.3-70B-Instruct | 38.89 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-27 |
| 29 | Kimi-K2-0711 | 35.58 | — | Imported | 2026-05-27 |
| 30 | Grok-4.1-Fast | 32.73 | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-27 |
| 31 | Hunyuan-T1-20250822 | 32.64 | — | Imported | 2026-05-27 |
| 32 | Intern-S1 | 31.46 | — | Imported | 2026-05-27 |
| 33 | Seed-OSS-36B-Instruct | 30.3 | — | Imported | 2026-05-27 |
| 34 | Mistral-Large-2411 | 29.72 | Mistral Large 2411 mistralai-mistral-large-2411 | Imported | 2026-05-27 |
| 35 | Mistral-Large-3-2512 | 28.43 | — | Imported | 2026-05-27 |
| 36 | Grok-3-Mini | 28.37 | Grok 3 Mini x-ai-grok-3-mini | Imported | 2026-05-27 |
| 37 | Llama-4-Maverick | 28.18 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-27 |
| 38 | SenseNova-V6-5-Turbo | 24.0 | — | Imported | 2026-05-27 |
| 39 | Ernie-4.5-21B-A3B-Thinking | 23.66 | ERNIE 4.5 21B A3B Thinking baidu-ernie-4.5-21b-a3b-thinking | Imported | 2026-05-27 |
| 40 | GPT-4.1-Mini | 22.99 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-27 |
| 41 | DeepSeek-V3-0324 | 18.08 | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-27 |
| 42 | Hunyuan-A13B-Instruct | 16.94 | Hunyuan A13B Instruct tencent-hunyuan-a13b-instruct | Imported | 2026-05-27 |
| 43 | Pangu-Pro-MoE-72B-A16B | 14.32 | — | Imported | 2026-05-27 |
No matching rows.