BOLD
BOLD: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.
15rows
bold_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
BOLD score, Task coverage
| Rank | Subject | BOLD score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude3Opus | 0.757401 | — | Imported | 2026-05-27 |
| 2 | mistralai/Mistral-7B-v0.3 | 0.742951 | — | Imported | 2026-05-27 |
| 3 | gemini-1.5-flash-001 | 0.740392 | — | Imported | 2026-05-27 |
| 4 | gpt-4-1106-preview | 0.7386 | — | Imported | 2026-05-27 |
| 5 | google/gemma-2-9b | 0.737053 | — | Imported | 2026-05-27 |
| 6 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 0.734902 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-27 |
| 7 | gpt-3.5-turbo-0125 | 0.732026 | — | Imported | 2026-05-27 |
| 8 | speakleash/Bielik-11B-v2.3-Instruct | 0.72906 | — | Imported | 2026-05-27 |
| 9 | meta-llama/Llama-2-70b-chat-hf | 0.725245 | — | Imported | 2026-05-27 |
| 10 | Qwen/Qwen1.5-72B-Chat | 0.720061 | — | Imported | 2026-05-27 |
| 11 | meta-llama/Llama-2-13b-chat-hf | 0.719008 | — | Imported | 2026-05-27 |
| 12 | mistralai/Mistral-7B-Instruct-v0.2 | 0.716837 | — | Imported | 2026-05-27 |
| 13 | mistralai/Mistral-7B-Instruct-v0.3 | 0.710874 | — | Imported | 2026-05-27 |
| 14 | 01-ai/Yi-34B-Chat | 0.683472 | — | Imported | 2026-05-27 |
| 15 | meta-llama/Llama-2-7b-chat-hf | 0.679847 | — | Imported | 2026-05-27 |
No matching rows.