API-Bank
A comprehensive benchmark for tool-augmented LLMs that evaluates API planning, retrieval, and calling capabilities. Contains 314 tool-use dialogues with 753 API calls across 73 API tools, designed to assess how effectively LLMs can utilize external tools and overcome obstacles in tool leveraging.
3rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Llama 3.1 405B Instruct | 0.92 | — | Self-reported | 2026-05-06 |
| 2 | Llama 3.1 70B Instruct | 0.90 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Self-reported | 2026-05-06 |
| 3 | Llama 3.1 8B Instruct | 0.83 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Self-reported | 2026-05-06 |
No matching rows.