EduGuardBench
Education-specific safety benchmark for teaching harm, adversarial safety, pedagogical fidelity, refusal, and safe tutoring behavior.
14rows
rfsprimary metric
2026-05-27sampled
Metadata
Metrics
RFS, Accuracy, Omission Rate (lower is better), Inclusion Rate (lower is better)
| Rank | Subject | RFS | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude-3.7 | 0.77 | — | Imported | 2026-05-27 |
| 2 | Deepseek-R1 | 0.75 | R1 deepseek-r1 | Imported | 2026-05-27 |
| 3 | Qwen3-32B-R | 0.75 | — | Imported | 2026-05-27 |
| 4 | Deepseek-V3 | 0.73 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-27 |
| 5 | R1-Distill-70B | 0.73 | — | Imported | 2026-05-27 |
| 6 | Qwen3-32B | 0.72 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-27 |
| 7 | Educhat-r1 | 0.71 | — | Imported | 2026-05-27 |
| 8 | GLM-Z1-9B | 0.69 | — | Imported | 2026-05-27 |
| 9 | GPT-4o | 0.69 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 10 | Qwen3-235B-R | 0.69 | — | Imported | 2026-05-27 |
| 11 | Qwen3-8B-R | 0.69 | — | Imported | 2026-05-27 |
| 12 | Qwen3-235B | 0.67 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-27 |
| 13 | Qwen3-8B | 0.61 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-27 |
| 14 | Qwen2.5-72B | 0.56 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-27 |
No matching rows.