EnterpriseRAG-Bench
Leaderboard for RAG systems on EnterpriseRAG-Bench, a benchmark of company-internal knowledge retrieval and answer generation across 500 enterprise questions.
15rows
overall_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Score, Correctness, Completeness, Recall, Invalid Extra Docs (lower is better)
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | OpenClaw | 68.22 | — | Imported | 2026-05-06 |
| 2 | OpenAI File Search | 61.03 | — | Imported | 2026-05-06 |
| 3 | Bash Agent (GPT-5.4) + GPT-5.4 | 52.63 | — | Imported | 2026-05-06 |
| 4 | BM25 + GPT-5.4 | 50.60 | — | Imported | 2026-05-06 |
| 5 | RAGFlow | 50.24 | — | Imported | 2026-05-06 |
| 6 | Amazon Q (Kendra) | 48.96 | — | Imported | 2026-05-06 |
| 7 | Azure AI Search | 48.42 | — | Imported | 2026-05-06 |
| 8 | Vertex AI Search | 41.87 | — | Imported | 2026-05-06 |
| 9 | NVIDIA AI Blueprints | 37.73 | — | Imported | 2026-05-06 |
| 10 | Vector (text-embedding-3-large) + GPT-5.4 | 37.72 | — | Imported | 2026-05-06 |
| 11 | AnythingLLM | 35.58 | — | Imported | 2026-05-06 |
| 12 | Weaviate Verba | 34.48 | — | Imported | 2026-05-06 |
| 13 | LlamaIndex (default configs) | 27.20 | — | Imported | 2026-05-06 |
| 14 | LangChain (default configs) | 24.98 | — | Imported | 2026-05-06 |
| 15 | Open WebUI + Chroma | 24.89 | — | Imported | 2026-05-06 |
No matching rows.