ChatRAG Bench | BenchmarkList

Metadata

Average (all), Average (exclude HybriDial), Unanswerable Avg-Both

Rank	Subject	Average (all)	Model Match	Provenance	Sampled
1	ChatQA-1.5-70B	58.25	—	Imported	2026-05-06
2	ChatQA-1.5-8B	55.17	—	Imported	2026-05-06
3	ChatQA-1.0-70B	54.14	—	Imported	2026-05-06
4	GPT-4-Turbo	54.03	GPT-4 Turbo openai-gpt-4-turbo	Imported	2026-05-06
5	GPT-4-0613	53.90	GPT-4 openai-gpt-4	Imported	2026-05-06
6	Llama3-instruct-70b	52.52	—	Imported	2026-05-06
7	Command-R-Plus	50.93	—	Imported	2026-05-06
8	ChatQA-1.0-7B	47.71	—	Imported	2026-05-06