RGB
RGB: Measures long-context retrieval, needle finding, summarization, factual grounding, or retrieval-augmented generation quality.
6rows
mean_noise_robustness_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Mean noise robustness accuracy, English noise ratio 0 accuracy, English noise ratio 0.2 accuracy, English noise ratio 0.4 accuracy, English noise ratio 0.6 accuracy, English noise ratio 0.8 accuracy, Chinese noise ratio 0 accuracy, Chinese noise ratio 0.2 accuracy, Chinese noise ratio 0.4 accuracy, Chinese noise ratio 0.6 accuracy, Chinese noise ratio 0.8 accuracy
| Rank | Subject | Mean noise robustness accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | ChatGPT | 89.068% | — | Imported | 2026-05-27 |
| 2 | Qwen-7B-Chat | 86.567% | — | Imported | 2026-05-27 |
| 3 | ChatGLM-6B | 85.434% | — | Imported | 2026-05-27 |
| 4 | BELLE-7B-2M | 79.134% | — | Imported | 2026-05-27 |
| 5 | ChatGLM2-6B | 77.066% | — | Imported | 2026-05-27 |
| 6 | Vicuna-7B-v1.3 | 76.4% | — | Imported | 2026-05-27 |
No matching rows.