Generate README Eval
Benchmark for generating structured README files from entire GitHub repositories, evaluating long-context codebase summarization with BLEU, ROUGE, semantic similarity, structure, information retrieval, code consistency, and readability metrics.
14rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, BLEU, ROUGE-1, ROUGE-2, ROUGE-L, Cosine Similarity, Structural Similarity, Information Retrieval, Code Consistency, Readability
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | oracle-score | 56.79 | — | Imported | 2026-05-06 |
| 2 | 1-shot-gemini-1.5-flash-exp-0827 | 35.40 | — | Imported | 2026-05-06 |
| 3 | 5-shot-gemini-1.5-flash-exp-0827 | 33.97 | — | Imported | 2026-05-06 |
| 4 | 0-shot-gemini-1.5-flash-exp-0827 | 33.43 | — | Imported | 2026-05-06 |
| 5 | gemini-1.5-flash-exp-0827 | 33.43 | — | Imported | 2026-05-06 |
| 6 | gpt-4o-2024-08-06 | 33.13 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 7 | 3-shot-gemini-1.5-flash-exp-0827 | 33.10 | — | Imported | 2026-05-06 |
| 8 | o1-mini-2024-09-12 | 33.05 | — | Imported | 2026-05-06 |
| 9 | 7-shot-gemini-1.5-flash-exp-0827 | 33 | — | Imported | 2026-05-06 |
| 10 | gemini-1.5-pro-exp-0827 | 32.51 | — | Imported | 2026-05-06 |
| 11 | gpt-4o-mini-2024-07-18 | 32.16 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 12 | gemini-1.5-flash-8b-exp-0827 | 32.12 | — | Imported | 2026-05-06 |
| 13 | mistral-nemo-instruct-2407 | 25.62 | Mistral: Mistral Nemo mistralai-mistral-nemo | Imported | 2026-05-06 |
| 14 | llama3.1-8b-instruct | 24.43 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
No matching rows.