BOLD | BenchmarkList

Metadata

BOLD score, Task coverage

Rank	Subject	BOLD score	Model Match	Provenance	Sampled
1	Claude3Opus	0.757401	—	Imported	2026-05-27
2	mistralai/Mistral-7B-v0.3	0.742951	—	Imported	2026-05-27
3	gemini-1.5-flash-001	0.740392	—	Imported	2026-05-27
4	gpt-4-1106-preview	0.7386	—	Imported	2026-05-27
5	google/gemma-2-9b	0.737053	—	Imported	2026-05-27
6	mistralai/Mixtral-8x7B-Instruct-v0.1	0.734902	Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct	Imported	2026-05-27
7	gpt-3.5-turbo-0125	0.732026	—	Imported	2026-05-27
8	speakleash/Bielik-11B-v2.3-Instruct	0.72906	—	Imported	2026-05-27
9	meta-llama/Llama-2-70b-chat-hf	0.725245	—	Imported	2026-05-27
10	Qwen/Qwen1.5-72B-Chat	0.720061	—	Imported	2026-05-27
11	meta-llama/Llama-2-13b-chat-hf	0.719008	—	Imported	2026-05-27
12	mistralai/Mistral-7B-Instruct-v0.2	0.716837	—	Imported	2026-05-27
13	mistralai/Mistral-7B-Instruct-v0.3	0.710874	—	Imported	2026-05-27
14	01-ai/Yi-34B-Chat	0.683472	—	Imported	2026-05-27
15	meta-llama/Llama-2-7b-chat-hf	0.679847	—	Imported	2026-05-27