BFCL-v3
Berkeley Function Calling Leaderboard v3 (BFCL-v3) is an advanced benchmark that evaluates large language models' function calling capabilities through multi-turn and multi-step interactions. It introduces extended conversational exchanges where models must retain contextual information across turns and execute multiple internal function calls for complex user requests. The benchmark includes 1000 test cases across domains like vehicle control, trading bots, travel booking, and file system management, using state-based evaluation to verify both system state changes and execution path correctness.
18rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GLM-4.5 | 0.78 | GLM 4.5 z-ai-glm-4.5 | Self-reported | 2026-05-06 |
| 2 | GLM-4.5-Air | 0.76 | GLM 4.5 Air z-ai-glm-4.5-air | Self-reported | 2026-05-06 |
| 3 | LongCat-Flash-Thinking | 0.74 | — | Self-reported | 2026-05-06 |
| 4 | Qwen3-Next-80B-A3B-Thinking | 0.72 | Qwen3 Next 80B A3B Thinking qwen-qwen3-next-80b-a3b-thinking | Self-reported | 2026-05-06 |
| 5 | Qwen3 VL 235B A22B Thinking | 0.72 | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Self-reported | 2026-05-06 |
| 5 | Qwen3-235B-A22B-Thinking-2507 | 0.72 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Self-reported | 2026-05-06 |
| 7 | Qwen3 VL 32B Thinking | 0.72 | — | Self-reported | 2026-05-06 |
| 8 | Qwen3-235B-A22B-Instruct-2507 | 0.71 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Self-reported | 2026-05-06 |
| 9 | Qwen3-Next-80B-A3B-Instruct | 0.70 | Qwen3 Next 80B A3B Instruct qwen-qwen3-next-80b-a3b-instruct | Self-reported | 2026-05-06 |
| 10 | Qwen3 VL 32B Instruct | 0.70 | Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct | Self-reported | 2026-05-06 |
| 11 | Qwen3-Coder 480B A35B Instruct | 0.69 | Qwen3 Coder 480B A35B qwen-qwen3-coder | Self-reported | 2026-05-06 |
| 12 | Qwen3 VL 30B A3B Thinking | 0.69 | Qwen3 VL 30B A3B Thinking qwen-qwen3-vl-30b-a3b-thinking | Self-reported | 2026-05-06 |
| 13 | Qwen3 VL 235B A22B Instruct | 0.68 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Self-reported | 2026-05-06 |
| 14 | Qwen3 VL 4B Thinking | 0.67 | — | Self-reported | 2026-05-06 |
| 15 | Qwen3 VL 8B Instruct | 0.66 | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Self-reported | 2026-05-06 |
| 15 | Qwen3 VL 30B A3B Instruct | 0.66 | Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct | Self-reported | 2026-05-06 |
| 17 | Qwen3 VL 4B Instruct | 0.63 | — | Self-reported | 2026-05-06 |
| 18 | Qwen3 VL 8B Thinking | 0.63 | Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking | Self-reported | 2026-05-06 |
No matching rows.