RedSage-Bench | BenchmarkList

Metadata

ID: redsage_bench
Category: Cybersecurity
Release: Unknown
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Macro Accuracy, Knowledge General Accuracy, Knowledge Frameworks Accuracy, Offensive Skills Accuracy, Command-Line Tools Accuracy, Kali Tools Accuracy

Rank	Subject	Macro Accuracy	Model Match	Provenance	Sampled
1	GPT-5	88.68%	GPT-5 openai-gpt-5	Imported	2026-05-28
2	RedSage-8B-Ins	85.73%	—	Imported	2026-05-28
3	Qwen3-32B	85.4%	Qwen3 32B qwen-qwen3-32b	Imported	2026-05-28
4	RedSage-8B-Seed	85.21%	—	Imported	2026-05-28
5	RedSage-8B-Base	85.05%	—	Imported	2026-05-28
6	RedSage-8B-CFW	84.86%	—	Imported	2026-05-28
7	RedSage-8B-DPO	84.83%	—	Imported	2026-05-28
8	Qwen3-8B-Base	84.24%	—	Imported	2026-05-28
9	Qwen3-8B	81.85%	Qwen3 8B qwen-qwen3-8b	Imported	2026-05-28
10	DeepHat-V1-7B	80.18%	—	Imported	2026-05-28
11	Foundation-Sec-8B	78.51%	—	Imported	2026-05-28
12	Llama-3.1-8B	78.02%	—	Imported	2026-05-28
13	Llama-3.1-8B-Instruct	77.05%	Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct	Imported	2026-05-28
14	Llama-Primus-Base	77.02%	—	Imported	2026-05-28
15	Foundation-Sec-8B-Instruct	76.12%	—	Imported	2026-05-28
16	Llama-Primus-Merged	74.81%	—	Imported	2026-05-28
17	Lily-Cybersecurity-7B-v0.2	71.19%	—	Imported	2026-05-28