PlaceboBench
Medical-domain hallucination benchmark with labeled model answers to pharmaceutical questions grounded in EMA product information.
7rows
non_hallucination_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Non-Hallucination Rate, Hallucination Rate (lower is better), Hallucinations per Answer (lower is better), Sample Count
| Rank | Subject | Non-Hallucination Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview | 73.913 | Gemini 3 google-gemini-3 | Imported | 2026-05-27 |
| 2 | gpt-5.2 | 63.2353 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-27 |
| 3 | claude-sonnet-4-5 | 62.3188 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-27 |
| 4 | accounts/fireworks/models/kimi-k2p5 | 53.6232 | — | Imported | 2026-05-27 |
| 5 | gemini-3-flash-preview | 44.9275 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-27 |
| 6 | gpt-5-mini | 39.1304 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-27 |
| 7 | claude-opus-4-6 | 36.2319 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-27 |
No matching rows.