Berkeley Function-Calling Leaderboard

Measures AI models' ability to correctly call and use functions in various contexts

109rows
overall_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Overall Accuracy, Non-Live AST Accuracy, Live Accuracy, Multi Turn Accuracy, Web Search Accuracy, Memory Accuracy, Total Cost (lower is better), Latency Mean (lower is better)

Latest Results

Rows are imported from the current Berkeley Function-Calling Leaderboard data_overall.csv. Legacy non-snapshot rows were replaced with the current BenchmarkList snapshot schema.

Rank Subject Overall Accuracy Model Match Provenance Sampled
1 Claude-Opus-4-5-20251101 (FC) 77.47% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-27
2 Claude-Sonnet-4-5-20250929 (FC) 73.24% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
3 Gemini-3-Pro-Preview (Prompt) 72.51% Gemini 3
google-gemini-3
Imported 2026-05-27
4 GLM-4.6 (FC thinking) 72.38% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-27
5 Grok-4-1-fast-reasoning (FC) 69.57% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-27
6 Claude-Haiku-4-5-20251001 (FC) 68.7% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-27
7 Gemini-3-Pro-Preview (FC) 68.14% Gemini 3
google-gemini-3
Imported 2026-05-27
8 o3-2025-04-16 (Prompt) 63.05% o3
openai-o3
Imported 2026-05-27
9 Grok-4-0709 (Prompt) 62.97% GROK Grok 4
x-ai-grok-4
Imported 2026-05-27
10 Grok-4-0709 (FC) 61.38% GROK Grok 4
x-ai-grok-4
Imported 2026-05-27
11 Moonshotai-Kimi-K2-Instruct (FC) 59.06% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-27
12 Grok-4-1-fast-non-reasoning (FC) 58.29% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-27
13 Command A Reasoning (FC) 57.06% Imported 2026-05-27
14 DeepSeek-V3.2-Exp (Prompt + Thinking) 56.73% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-27
15 Gemini-2.5-Flash (FC) 56.24% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
16 GPT-5.2-2025-12-11 (FC) 55.87% GPT-5.2
openai-gpt-5.2
Imported 2026-05-27
17 GPT-5-mini-2025-08-07 (FC) 55.46% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-27
18 xLAM-2-32b-fc-r (FC) 54.66% Imported 2026-05-27
19 DeepSeek-V3.2-Exp (FC) 54.12% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-27
20 GPT-4.1-2025-04-14 (FC) 53.96% GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
21 o4-mini-2025-04-16 (FC) 53.24% o4 Mini
openai-o4-mini
Imported 2026-05-27
22 xLAM-2-70b-fc-r (FC) 53.07% Imported 2026-05-27
23 Qwen3-235B-A22B-Instruct-2507 (Prompt) 52.15% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-27
24 GPT-5-nano-2025-08-07 (FC) 51.45% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-27
25 Nanbeige4-3B-Thinking-2511 (FC) 51.4% Imported 2026-05-27
26 Gemini-2.5-Flash (Prompt) 50.9% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
27 GPT-4.1-mini-2025-04-14 (FC) 50.45% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-27
28 o4-mini-2025-04-16 (Prompt) 50.26% o4 Mini
openai-o4-mini
Imported 2026-05-27
29 Qwen3-32B (FC) 48.71% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-27
30 o3-2025-04-16 (FC) 48.56% o3
openai-o3
Imported 2026-05-27
31 Qwen3-235B-A22B-Instruct-2507 (FC) 47.99% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-27
32 Nanbeige3.5-Pro-Thinking (FC) 47.68% Imported 2026-05-27
33 Qwen3-32B (Prompt) 46.78% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-27
34 xLAM-2-8b-fc-r (FC) 46.68% Imported 2026-05-27
35 Command A (FC) 46.49% C Command A
cohere-command-a
Imported 2026-05-27
36 BitAgent-Bounty-8B 46.23% Imported 2026-05-27
37 Arch-Agent-32B 45.37% Imported 2026-05-27
38 GPT-5.2-2025-12-11 (Prompt) 45.27% GPT-5.2
openai-gpt-5.2
Imported 2026-05-27
39 Qwen3-8B (FC) 42.57% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-27
40 ToolACE-2-8B (FC) 42.44% Imported 2026-05-27
41 Qwen3-30B-A3B-Instruct-2507 (FC) 41.39% Qwen3 30B A3B Instruct 2507
qwen-qwen3-30b-a3b-instruct-2507
Imported 2026-05-27
42 xLAM-2-3b-fc-r (FC) 41.22% Imported 2026-05-27
43 Qwen3-14B (FC) 41.03% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-27
44 Qwen3-8B (Prompt) 40.43% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-27
45 GPT-4.1-2025-04-14 (Prompt) 39.38% GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
46 mistral-large-2411 (FC) 38.37% Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-05-27
47 Qwen3-14B (Prompt) 37.77% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-27
48 Mistral-Medium-2505 37.69% Imported 2026-05-27
49 Mistral-Medium-2505 (FC) 37.56% Imported 2026-05-27
50 Llama-4-Maverick-17B-128E-Instruct-FP8 (FC) 37.29% Imported 2026-05-27
51 Mistral-small-2506 (FC) 37.15% Imported 2026-05-27
52 Gemini-2.5-Flash-Lite (FC) 36.87% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-27
53 Qwen3-30B-A3B-Instruct-2507 (Prompt) 36.7% Qwen3 30B A3B Instruct 2507
qwen-qwen3-30b-a3b-instruct-2507
Imported 2026-05-27
54 Qwen3-4B-Instruct-2507 (FC) 35.68% Imported 2026-05-27
55 Qwen3-4B-Instruct-2507 (Prompt) 35.52% Imported 2026-05-27
56 Arch-Agent-3B 35.36% Imported 2026-05-27
57 Claude-Opus-4-5-20251101 (Prompt) 33.47% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-27
58 GPT-4.1-nano-2025-04-14 (FC) 33.05% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-27
59 Mistral-Small-2506 (Prompt) 32.38% Imported 2026-05-27
60 Arch-Agent-1.5B 32.14% Imported 2026-05-27
61 Command R7B (FC) 32.07% C Command R7B (12-2024)
cohere-command-r7b-12-2024
Imported 2026-05-27
62 Llama-3.3-70B-Instruct (FC) 31.9% Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-27
63 mistral-large-2411 (Prompt) 31.84% Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-05-27
64 Hammer2.1-7b (FC) 31.67% Imported 2026-05-27
65 xLAM-2-1b-fc-r (FC) 30.44% Imported 2026-05-27
66 Gemma-3-12b-it (Prompt) 30.43% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-27
67 GPT-4.1-mini-2025-04-14 (Prompt) 29.73% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-27
68 Hammer2.1-3b (FC) 29.71% Imported 2026-05-27
69 Gemma-3-27b-it (Prompt) 29.47% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-27
70 Phi-4 (Prompt) 28.79% Phi 4
microsoft-phi-4
Imported 2026-05-27
71 Qwen3-1.7B (FC) 28.41% Imported 2026-05-27
72 Llama-4-Scout-17B-16E-Instruct (FC) 28.13% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-27
73 Gemini-2.5-Flash-Lite (Prompt) 28.03% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-27
74 CoALM-70B 27.99% Imported 2026-05-27
75 Hammer2.1-1.5b (FC) 27.88% Imported 2026-05-27
76 palmyra-x-004 (FC) 27.87% Imported 2026-05-27
77 GPT-5-mini-2025-08-07 (Prompt) 27.83% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-27
78 Open-Mistral-Nemo-2407 (FC) 27.63% Imported 2026-05-27
79 GPT-5-nano-2025-08-07 (Prompt) 27.55% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-27
80 Amazon-Nova-2-Lite-v1:0 (FC) 27.1% Imported 2026-05-27
81 Granite-3.1-8B-Instruct (FC) 27.1% Imported 2026-05-27
82 Falcon3-10B-Instruct (FC) 27.01% Imported 2026-05-27
83 Granite-3.2-8B-Instruct (FC) 26.87% Imported 2026-05-27
84 CoALM-8B 26.81% Imported 2026-05-27
85 Llama-3.1-8B-Instruct (Prompt) 25.83% Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-27
86 MiniCPM3-4B-FC (FC) 25.55% Imported 2026-05-27
87 Claude-Haiku-4-5-20251001 (Prompt) 25.26% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-27
88 Amazon-Nova-Pro-v1:0 (FC) 24.97% Imported 2026-05-27
89 Claude-Sonnet-4-5-20250929 (Prompt) 24.9% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
90 GPT-4.1-nano-2025-04-14 (Prompt) 24.88% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-27
91 Falcon3-7B-Instruct (FC) 24.03% Imported 2026-05-27
92 Qwen3-0.6B (FC) 23.93% Imported 2026-05-27
93 Granite-20b-FunctionCalling (FC) 23.23% Imported 2026-05-27
94 Qwen3-0.6B (Prompt) 22.38% Imported 2026-05-27
95 Amazon-Nova-Micro-v1:0 (FC) 22.29% Imported 2026-05-27
96 RZN-T (Prompt) 22.25% Imported 2026-05-27
97 MiniCPM3-4B (Prompt) 22.08% Imported 2026-05-27
98 Llama-3.2-3B-Instruct (FC) 21.95% Llama 3.2 3B Instruct
meta-llama-llama-3.2-3b-instruct
Imported 2026-05-27
99 Bielik-11B-v2.3-Instruct (Prompt) 21.9% Imported 2026-05-27
100 Hammer2.1-0.5b (FC) 21.22% Imported 2026-05-27
101 Gemma-3-4b-it (Prompt) 19.62% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-27
102 Open-Mistral-Nemo-2407 (Prompt) 19.31% Imported 2026-05-27
103 Granite-4.0-350m (FC) 18.98% Imported 2026-05-27
104 Falcon3-3B-Instruct (FC) 16.25% Imported 2026-05-27
105 Ministral-8B-Instruct-2410 (FC) 11.1% Imported 2026-05-27
106 Falcon3-1B-Instruct (FC) 11.08% Imported 2026-05-27
107 Llama-3.2-1B-Instruct (FC) 10.82% Llama 3.2 1B Instruct
meta-llama-llama-3.2-1b-instruct
Imported 2026-05-27
108 Llama-3.1-Nemotron-Ultra-253B-v1 (FC) 10% Imported 2026-05-27
109 Gemma-3-1b-it (Prompt) 7.17% Imported 2026-05-27