RGB

RGB: Measures long-context retrieval, needle finding, summarization, factual grounding, or retrieval-augmented generation quality.

6rows
mean_noise_robustness_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Mean noise robustness accuracy, English noise ratio 0 accuracy, English noise ratio 0.2 accuracy, English noise ratio 0.4 accuracy, English noise ratio 0.6 accuracy, English noise ratio 0.8 accuracy, Chinese noise ratio 0 accuracy, Chinese noise ratio 0.2 accuracy, Chinese noise ratio 0.4 accuracy, Chinese noise ratio 0.6 accuracy, Chinese noise ratio 0.8 accuracy

Latest Results

Rows are transcribed from the public RGB paper Table 1. Primary score is a BenchmarkList-derived mean over all reported noise-robustness accuracy columns.

Rank Subject Mean noise robustness accuracy Model Match Provenance Sampled
1 ChatGPT 89.068% Imported 2026-05-27
2 Qwen-7B-Chat 86.567% Imported 2026-05-27
3 ChatGLM-6B 85.434% Imported 2026-05-27
4 BELLE-7B-2M 79.134% Imported 2026-05-27
5 ChatGLM2-6B 77.066% Imported 2026-05-27
6 Vicuna-7B-v1.3 76.4% Imported 2026-05-27