Gorilla Benchmark API Bench

Metadata

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Llama 3.1 405B Instruct	0.35	—	Self-reported	2026-05-06
2	Llama 3.1 70B Instruct	0.30	Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct	Self-reported	2026-05-06
3	Llama 3.1 8B Instruct	0.08	Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct	Self-reported	2026-05-06