HH-RLHF
HH-RLHF: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.
0rows
scoreprimary metric
—sampled
Metadata
Metrics
Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|
No matching rows.