VAKRA

Executable enterprise-agent benchmark for multi-hop, multi-source tool-calling across APIs, document retrieval, and policy-constrained interactions.

5rows
overallprimary metric
2026-05-06sampled

Metadata

Metrics

Overall, API Chaining, Tool Selection, Multihop Reasoning, Multihop Multisource Policy Adherence

Latest Results

Rows are parsed from the public VAKRA leaderboard JSON endpoint. Source agent and model display names are preserved.

Rank Subject Overall Model Match Provenance Sampled
1 ReAct (Prompt) + Gemini-3-Flash-Preview 38.06 Imported 2026-05-06
2 ReAct (Prompt) + GPT-OSS-120B 34.19 Imported 2026-05-06
3 ReAct (Prompt) + Claude-Sonnet-4.5 32.80 Imported 2026-05-06
4 ReAct (Prompt) + LLAMA-405B 32.42 Imported 2026-05-06
5 ReAct (Prompt) + Granite-4.0-h-small-32B 28.54 Imported 2026-05-06