UAVBench
Physically grounded benchmark for autonomous and agentic AI UAV systems, with 50,000 validated flight scenarios and 50,000 multiple-choice UAV reasoning questions spanning navigation, safety, policy, cyber-physical security, ethics, energy, and hybrid reasoning.
39rows
accuracyprimary metric
2026-05-06sampled
Metadata
Metrics
Accuracy, Correct Answers, Evaluated Questions
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | qwen/qwen3-235b-a22b-2507 | 83.55 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-06 |
| 2 | openai/chatgpt-4o-latest | 80.35 | — | Imported | 2026-05-06 |
| 3 | openai/gpt-5-chat | 80.15 | GPT-5 Chat openai-gpt-5-chat | Imported | 2026-05-06 |
| 4 | qwen/qwen3-max | 79.85 | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-06 |
| 5 | openai/gpt-4.1 | 79.05 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 6 | openai/gpt-4.1-mini | 78.10 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-06 |
| 7 | moonshotai/kimi-k2-0905 | 77.75 | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-06 |
| 8 | opengvlab/internvl3-78b | 77.10 | — | Imported | 2026-05-06 |
| 9 | anthropic/claude-haiku-4.5 | 77.05 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 10 | mistralai/mistral-medium-3.1 | 76.85 | Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1 | Imported | 2026-05-06 |
| 11 | google/gemini-2.5-flash | 76.75 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 12 | microsoft/phi-4-reasoning-plus | 76.75 | — | Imported | 2026-05-06 |
| 13 | qwen/qwen3-vl-8b-instruct | 75.95 | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Imported | 2026-05-06 |
| 14 | deepseek/deepseek-chat-v3-0324 | 75.90 | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-06 |
| 15 | baidu/ernie-4.5-300b-a47b | 75.45 | ERNIE 4.5 300B A47B baidu-ernie-4.5-300b-a47b | Imported | 2026-05-06 |
| 16 | meta-llama/llama-4-scout | 75.10 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 17 | deepseek/deepseek-v3.2-exp | 73.55 | DeepSeek V3.2 Exp deepseek-deepseek-v3.2-exp | Imported | 2026-05-06 |
| 18 | google/gemma-3n-e4b-it | 73.25 | Gemma 3n 4B google-gemma-3n-e4b-it | Imported | 2026-05-06 |
| 19 | deepseek/deepseek-v3.1-terminus | 72.70 | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-06 |
| 20 | x-ai/grok-4-fast | 72.60 | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-06 |
| 21 | liquid/lfm-2.2-6b | 69.75 | — | Imported | 2026-05-06 |
| 22 | qwen/qwen-2.5-7b-instruct | 66.05 | Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct | Imported | 2026-05-06 |
| 23 | liquid/lfm2-8b-a1b | 65.80 | — | Imported | 2026-05-06 |
| 24 | allenai/olmo-2-0325-32b-instruct | 65.55 | — | Imported | 2026-05-06 |
| 25 | meta-llama/llama-3.1-8b-instruct | 65.30 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
| 26 | meta-llama/llama-3.2-3b-instruct | 62 | Llama 3.2 3B Instruct meta-llama-llama-3.2-3b-instruct | Imported | 2026-05-06 |
| 27 | ai21/jamba-mini-1.7 | 59.30 | — | Imported | 2026-05-06 |
| 28 | anthropic/claude-sonnet-4.5 | 58.40 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 29 | ibm-granite/granite-4.0-h-micro | 57.80 | Granite 4.0 Micro ibm-granite-granite-4.0-h-micro | Imported | 2026-05-06 |
| 30 | z-ai/glm-4.6 | 41.70 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-06 |
| 31 | qwen/qwen3-30b-a3b | 5.55 | Qwen3 30B A3B qwen-qwen3-30b-a3b | Imported | 2026-05-06 |
| 32 | nvidia/nemotron-nano-9b-v2 | 2.40 | Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | Imported | 2026-05-06 |
| 33 | minimax/minimax-m1 | 1.75 | MiniMax M1 minimax-minimax-m1 | Imported | 2026-05-06 |
| 34 | baidu/ernie-4.5-21b-a3b-thinking | 0 | ERNIE 4.5 21B A3B Thinking baidu-ernie-4.5-21b-a3b-thinking | Imported | 2026-05-06 |
| 35 | deepseek/deepseek-r1-0528-qwen3-8b | 0 | — | Imported | 2026-05-06 |
| 36 | minimax/minimax-m2 | 0 | MiniMax M2 minimax-minimax-m2 | Imported | 2026-05-06 |
| 37 | minimax/minimax-m2:free | 0 | — | Imported | 2026-05-06 |
| 38 | nvidia/llama-3.3-nemotron-super-49b-v1.5 | 0 | Llama 3.3 Nemotron Super 49B V1.5 nvidia-llama-3.3-nemotron-super-49b-v1.5 | Imported | 2026-05-06 |
| 39 | openai/gpt-oss-safeguard-20b | 0 | gpt-oss-safeguard-20b openai-gpt-oss-safeguard-20b | Imported | 2026-05-06 |
No matching rows.