ToolAlpaca

ToolAlpaca: Evaluates tool calling, API use, function selection, structured arguments, and multi-step tool workflows.

5rows
real_world_api_overallprimary metric
2026-05-27sampled

Metadata

Metrics

Real-world API Overall, Real-world API Procedure, Real-world API Response, Simulated Tools Overall, Simulated Tools Procedure, Simulated Tools Response, Simulated Tools Human Accept

Latest Results

Rows are transcribed from the public ToolAlpaca paper Table 3. Primary score is real-world API overall accuracy.

Rank Subject Real-world API Overall Model Match Provenance Sampled
1 GPT-3.5 72.8% Imported 2026-05-27
2 ToolAlpaca-13B 61.4% Imported 2026-05-27
3 ToolAlpaca-7B 55.3% Imported 2026-05-27
4 Vicuna-13B 12.3% Imported 2026-05-27
5 Vicuna-7B 7.9% Imported 2026-05-27