Generate README Eval | BenchmarkList

Metadata

Score, BLEU, ROUGE-1, ROUGE-2, ROUGE-L, Cosine Similarity, Structural Similarity, Information Retrieval, Code Consistency, Readability

Rank	Subject	Score	Model Match	Provenance	Sampled
1	oracle-score	56.79	—	Imported	2026-05-06
2	1-shot-gemini-1.5-flash-exp-0827	35.40	—	Imported	2026-05-06
3	5-shot-gemini-1.5-flash-exp-0827	33.97	—	Imported	2026-05-06
4	0-shot-gemini-1.5-flash-exp-0827	33.43	—	Imported	2026-05-06
5	gemini-1.5-flash-exp-0827	33.43	—	Imported	2026-05-06
6	gpt-4o-2024-08-06	33.13	GPT-4o openai-gpt-4o	Imported	2026-05-06
7	3-shot-gemini-1.5-flash-exp-0827	33.10	—	Imported	2026-05-06
8	o1-mini-2024-09-12	33.05	—	Imported	2026-05-06
9	7-shot-gemini-1.5-flash-exp-0827	33	—	Imported	2026-05-06
10	gemini-1.5-pro-exp-0827	32.51	—	Imported	2026-05-06
11	gpt-4o-mini-2024-07-18	32.16	GPT-4o-mini openai-gpt-4o-mini	Imported	2026-05-06
12	gemini-1.5-flash-8b-exp-0827	32.12	—	Imported	2026-05-06
13	mistral-nemo-instruct-2407	25.62	Mistral: Mistral Nemo mistralai-mistral-nemo	Imported	2026-05-06
14	llama3.1-8b-instruct	24.43	Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct	Imported	2026-05-06