ChatRAG Bench
NVIDIA ChatRAG Bench evaluates conversational question answering over documents or retrieved context across ten derived datasets, including long-context, table reasoning, arithmetic, and unanswerable-question scenarios.
8rows
average_allprimary metric
2026-05-06sampled
Metadata
Metrics
Average (all), Average (exclude HybriDial), Unanswerable Avg-Both
| Rank | Subject | Average (all) | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | ChatQA-1.5-70B | 58.25 | — | Imported | 2026-05-06 |
| 2 | ChatQA-1.5-8B | 55.17 | — | Imported | 2026-05-06 |
| 3 | ChatQA-1.0-70B | 54.14 | — | Imported | 2026-05-06 |
| 4 | GPT-4-Turbo | 54.03 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 5 | GPT-4-0613 | 53.90 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 6 | Llama3-instruct-70b | 52.52 | — | Imported | 2026-05-06 |
| 7 | Command-R-Plus | 50.93 | — | Imported | 2026-05-06 |
| 8 | ChatQA-1.0-7B | 47.71 | — | Imported | 2026-05-06 |
No matching rows.