VAKRA
Executable enterprise-agent benchmark for multi-hop, multi-source tool-calling across APIs, document retrieval, and policy-constrained interactions.
5rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, API Chaining, Tool Selection, Multihop Reasoning, Multihop Multisource Policy Adherence
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | ReAct (Prompt) + Gemini-3-Flash-Preview | 38.06 | — | Imported | 2026-05-06 |
| 2 | ReAct (Prompt) + GPT-OSS-120B | 34.19 | — | Imported | 2026-05-06 |
| 3 | ReAct (Prompt) + Claude-Sonnet-4.5 | 32.80 | — | Imported | 2026-05-06 |
| 4 | ReAct (Prompt) + LLAMA-405B | 32.42 | — | Imported | 2026-05-06 |
| 5 | ReAct (Prompt) + Granite-4.0-h-small-32B | 28.54 | — | Imported | 2026-05-06 |
No matching rows.