Generate README Eval

Benchmark for generating structured README files from entire GitHub repositories, evaluating long-context codebase summarization with BLEU, ROUGE, semantic similarity, structure, information retrieval, code consistency, and readability metrics.

14rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, BLEU, ROUGE-1, ROUGE-2, ROUGE-L, Cosine Similarity, Structural Similarity, Information Retrieval, Code Consistency, Readability

Latest Results

Rows are parsed from public Markdown leaderboard tables in the Hugging Face dataset card. Both zero-shot and few-shot settings are retained as separate source configurations; source display names and log URLs are preserved.

Rank Subject Score Model Match Provenance Sampled
1 oracle-score 56.79 Imported 2026-05-06
2 1-shot-gemini-1.5-flash-exp-0827 35.40 Imported 2026-05-06
3 5-shot-gemini-1.5-flash-exp-0827 33.97 Imported 2026-05-06
4 0-shot-gemini-1.5-flash-exp-0827 33.43 Imported 2026-05-06
5 gemini-1.5-flash-exp-0827 33.43 Imported 2026-05-06
6 gpt-4o-2024-08-06 33.13 GPT-4o
openai-gpt-4o
Imported 2026-05-06
7 3-shot-gemini-1.5-flash-exp-0827 33.10 Imported 2026-05-06
8 o1-mini-2024-09-12 33.05 Imported 2026-05-06
9 7-shot-gemini-1.5-flash-exp-0827 33 Imported 2026-05-06
10 gemini-1.5-pro-exp-0827 32.51 Imported 2026-05-06
11 gpt-4o-mini-2024-07-18 32.16 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
12 gemini-1.5-flash-8b-exp-0827 32.12 Imported 2026-05-06
13 mistral-nemo-instruct-2407 25.62 Mistral: Mistral Nemo
mistralai-mistral-nemo
Imported 2026-05-06
14 llama3.1-8b-instruct 24.43 Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-06