RealToxicityPrompts
RealToxicityPrompts: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.
42rows
toxic_fractionprimary metric
2026-05-27sampled
Metadata
Metrics
Toxic fraction (lower is better), Stereotypes (race) (lower is better), Stereotypes (gender) (lower is better), Representation (race) (lower is better), Representation (gender) (lower is better), Denoised inference time (lower is better), # eval, # prompt tokens (lower is better), # output tokens (lower is better), # trials
| Rank | Subject | Toxic fraction | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Palmyra X (43B) | 0.00558 | — | Imported | 2026-05-27 |
| 2 | Cohere medium v20220720 (6.1B) | 0.008333 | — | Imported | 2026-05-27 |
| 3 | T5 (11B) | 0.009234 | — | Imported | 2026-05-27 |
| 4 | Cohere Command beta (52.4B) | 0.009874 | — | Imported | 2026-05-27 |
| 5 | Cohere medium v20221108 (6.1B) | 0.009874 | — | Imported | 2026-05-27 |
| 6 | Cohere xlarge v20220609 (52.4B) | 0.010061 | — | Imported | 2026-05-27 |
| 7 | Cohere small v20220720 (410M) | 0.010065 | — | Imported | 2026-05-27 |
| 8 | Cohere Command beta (6.1B) | 0.010127 | — | Imported | 2026-05-27 |
| 9 | Cohere large v20220720 (13.1B) | 0.010513 | — | Imported | 2026-05-27 |
| 10 | Cohere xlarge v20221108 (52.4B) | 0.012435 | — | Imported | 2026-05-27 |
| 11 | Luminous Base (13B) | 0.025905 | — | Imported | 2026-05-27 |
| 12 | Luminous Extended (30B) | 0.028088 | — | Imported | 2026-05-27 |
| 13 | text-davinci-003 | 0.038652 | — | Imported | 2026-05-27 |
| 14 | text-ada-001 | 0.03988 | — | Imported | 2026-05-27 |
| 15 | UL2 (20B) | 0.043144 | — | Imported | 2026-05-27 |
| 16 | InstructPalmyra (30B) | 0.048678 | — | Imported | 2026-05-27 |
| 17 | GPT-NeoX (20B) | 0.049953 | — | Imported | 2026-05-27 |
| 18 | J1-Large v1 (7.5B) | 0.051672 | — | Imported | 2026-05-27 |
| 19 | GPT-J (6B) | 0.052125 | — | Imported | 2026-05-27 |
| 20 | OPT (175B) | 0.052457 | — | Imported | 2026-05-27 |
| 21 | babbage (1.3B) | 0.053345 | — | Imported | 2026-05-27 |
| 22 | ada (350M) | 0.053856 | — | Imported | 2026-05-27 |
| 23 | J1-Grande v1 (17B) | 0.055837 | — | Imported | 2026-05-27 |
| 24 | Luminous Supreme (70B) | 0.056243 | — | Imported | 2026-05-27 |
| 25 | J1-Jumbo v1 (178B) | 0.056293 | — | Imported | 2026-05-27 |
| 26 | curie (6.7B) | 0.056297 | — | Imported | 2026-05-27 |
| 27 | BLOOM (176B) | 0.057057 | — | Imported | 2026-05-27 |
| 28 | OPT (66B) | 0.058029 | — | Imported | 2026-05-27 |
| 29 | TNLG v2 (530B) | 0.058353 | — | Imported | 2026-05-27 |
| 30 | GLM (130B) | 0.058473 | — | Imported | 2026-05-27 |
| 31 | TNLG v2 (6.7B) | 0.058664 | — | Imported | 2026-05-27 |
| 32 | Anthropic-LM v4-s3 (52B) | 0.058863 | — | Imported | 2026-05-27 |
| 33 | Jurassic-2 Large (7.5B) | 0.059623 | — | Imported | 2026-05-27 |
| 34 | Jurassic-2 Jumbo (178B) | 0.061874 | — | Imported | 2026-05-27 |
| 35 | text-babbage-001 | 0.061895 | — | Imported | 2026-05-27 |
| 36 | text-davinci-002 | 0.063028 | — | Imported | 2026-05-27 |
| 37 | T0pp (11B) | 0.063826 | — | Imported | 2026-05-27 |
| 38 | J1-Grande v2 beta (17B) | 0.064178 | — | Imported | 2026-05-27 |
| 39 | text-curie-001 | 0.064431 | — | Imported | 2026-05-27 |
| 40 | Jurassic-2 Grande (17B) | 0.067321 | — | Imported | 2026-05-27 |
| 41 | davinci (175B) | 0.069712 | — | Imported | 2026-05-27 |
| 42 | YaLM (100B) | 0.092246 | — | Imported | 2026-05-27 |
No matching rows.