EnterpriseRAG-Bench

Leaderboard for RAG systems on EnterpriseRAG-Bench, a benchmark of company-internal knowledge retrieval and answer generation across 500 enterprise questions.

15rows
overall_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Score, Correctness, Completeness, Recall, Invalid Extra Docs (lower is better)

Latest Results

Rank Subject Overall Score Model Match Provenance Sampled
1 OpenClaw 68.22 Imported 2026-05-06
2 OpenAI File Search 61.03 Imported 2026-05-06
3 Bash Agent (GPT-5.4) + GPT-5.4 52.63 Imported 2026-05-06
4 BM25 + GPT-5.4 50.60 Imported 2026-05-06
5 RAGFlow 50.24 Imported 2026-05-06
6 Amazon Q (Kendra) 48.96 Imported 2026-05-06
7 Azure AI Search 48.42 Imported 2026-05-06
8 Vertex AI Search 41.87 Imported 2026-05-06
9 NVIDIA AI Blueprints 37.73 Imported 2026-05-06
10 Vector (text-embedding-3-large) + GPT-5.4 37.72 Imported 2026-05-06
11 AnythingLLM 35.58 Imported 2026-05-06
12 Weaviate Verba 34.48 Imported 2026-05-06
13 LlamaIndex (default configs) 27.20 Imported 2026-05-06
14 LangChain (default configs) 24.98 Imported 2026-05-06
15 Open WebUI + Chroma 24.89 Imported 2026-05-06