EduGuardBench

Education-specific safety benchmark for teaching harm, adversarial safety, pedagogical fidelity, refusal, and safe tutoring behavior.

14rows
rfsprimary metric
2026-05-27sampled

Metadata

Metrics

RFS, Accuracy, Omission Rate (lower is better), Inclusion Rate (lower is better)

Latest Results

Rows are transcribed from the public EduGuardBench AAAI paper Table 1. Primary score is RFS.

Rank Subject RFS Model Match Provenance Sampled
1 Claude-3.7 0.77 Imported 2026-05-27
2 Deepseek-R1 0.75 R1
deepseek-r1
Imported 2026-05-27
3 Qwen3-32B-R 0.75 Imported 2026-05-27
4 Deepseek-V3 0.73 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-27
5 R1-Distill-70B 0.73 Imported 2026-05-27
6 Qwen3-32B 0.72 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-27
7 Educhat-r1 0.71 Imported 2026-05-27
8 GLM-Z1-9B 0.69 Imported 2026-05-27
9 GPT-4o 0.69 GPT-4o
openai-gpt-4o
Imported 2026-05-27
10 Qwen3-235B-R 0.69 Imported 2026-05-27
11 Qwen3-8B-R 0.69 Imported 2026-05-27
12 Qwen3-235B 0.67 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-27
13 Qwen3-8B 0.61 Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-27
14 Qwen2.5-72B 0.56 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-27