UAVBench | BenchmarkList

Metadata

ID: uavbench
Category: Agentic
Release: 2025-11-14
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Accuracy, Correct Answers, Evaluated Questions

Rank	Subject	Accuracy	Model Match	Provenance	Sampled
1	qwen/qwen3-235b-a22b-2507	83.55	Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507	Imported	2026-05-06
2	openai/chatgpt-4o-latest	80.35	—	Imported	2026-05-06
3	openai/gpt-5-chat	80.15	GPT-5 Chat openai-gpt-5-chat	Imported	2026-05-06
4	qwen/qwen3-max	79.85	Qwen3 Max qwen-qwen3-max	Imported	2026-05-06
5	openai/gpt-4.1	79.05	GPT-4.1 openai-gpt-4.1	Imported	2026-05-06
6	openai/gpt-4.1-mini	78.10	GPT-4.1 Mini openai-gpt-4.1-mini	Imported	2026-05-06
7	moonshotai/kimi-k2-0905	77.75	KIMI MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905	Imported	2026-05-06
8	opengvlab/internvl3-78b	77.10	—	Imported	2026-05-06
9	anthropic/claude-haiku-4.5	77.05	Claude Haiku 4.5 anthropic-claude-haiku-4.5	Imported	2026-05-06
10	mistralai/mistral-medium-3.1	76.85	Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1	Imported	2026-05-06
11	google/gemini-2.5-flash	76.75	Gemini 2.5 Flash google-gemini-2.5-flash	Imported	2026-05-06
12	microsoft/phi-4-reasoning-plus	76.75	—	Imported	2026-05-06
13	qwen/qwen3-vl-8b-instruct	75.95	Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct	Imported	2026-05-06
14	deepseek/deepseek-chat-v3-0324	75.90	DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324	Imported	2026-05-06
15	baidu/ernie-4.5-300b-a47b	75.45	ERNIE 4.5 300B A47B baidu-ernie-4.5-300b-a47b	Imported	2026-05-06
16	meta-llama/llama-4-scout	75.10	Llama 4 Scout meta-llama-llama-4-scout	Imported	2026-05-06
17	deepseek/deepseek-v3.2-exp	73.55	DeepSeek V3.2 Exp deepseek-deepseek-v3.2-exp	Imported	2026-05-06
18	google/gemma-3n-e4b-it	73.25	Gemma 3n 4B google-gemma-3n-e4b-it	Imported	2026-05-06
19	deepseek/deepseek-v3.1-terminus	72.70	DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus	Imported	2026-05-06
20	x-ai/grok-4-fast	72.60	GROK Grok 4 Fast x-ai-grok-4-fast	Imported	2026-05-06
21	liquid/lfm-2.2-6b	69.75	—	Imported	2026-05-06
22	qwen/qwen-2.5-7b-instruct	66.05	Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct	Imported	2026-05-06
23	liquid/lfm2-8b-a1b	65.80	—	Imported	2026-05-06
24	allenai/olmo-2-0325-32b-instruct	65.55	—	Imported	2026-05-06
25	meta-llama/llama-3.1-8b-instruct	65.30	Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct	Imported	2026-05-06
26	meta-llama/llama-3.2-3b-instruct	62	Llama 3.2 3B Instruct meta-llama-llama-3.2-3b-instruct	Imported	2026-05-06
27	ai21/jamba-mini-1.7	59.30	—	Imported	2026-05-06
28	anthropic/claude-sonnet-4.5	58.40	Claude Sonnet 4.5 anthropic-claude-sonnet-4.5	Imported	2026-05-06
29	ibm-granite/granite-4.0-h-micro	57.80	Granite 4.0 Micro ibm-granite-granite-4.0-h-micro	Imported	2026-05-06
30	z-ai/glm-4.6	41.70	GLM GLM 4.6 z-ai-glm-4.6	Imported	2026-05-06
31	qwen/qwen3-30b-a3b	5.55	Qwen3 30B A3B qwen-qwen3-30b-a3b	Imported	2026-05-06
32	nvidia/nemotron-nano-9b-v2	2.40	Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2	Imported	2026-05-06
33	minimax/minimax-m1	1.75	MiniMax M1 minimax-minimax-m1	Imported	2026-05-06
34	baidu/ernie-4.5-21b-a3b-thinking	0	ERNIE 4.5 21B A3B Thinking baidu-ernie-4.5-21b-a3b-thinking	Imported	2026-05-06
35	deepseek/deepseek-r1-0528-qwen3-8b	0	—	Imported	2026-05-06
36	minimax/minimax-m2	0	MiniMax M2 minimax-minimax-m2	Imported	2026-05-06
37	minimax/minimax-m2:free	0	—	Imported	2026-05-06
38	nvidia/llama-3.3-nemotron-super-49b-v1.5	0	Llama 3.3 Nemotron Super 49B V1.5 nvidia-llama-3.3-nemotron-super-49b-v1.5	Imported	2026-05-06
39	openai/gpt-oss-safeguard-20b	0	gpt-oss-safeguard-20b openai-gpt-oss-safeguard-20b	Imported	2026-05-06

Metadata

Metrics

Latest Results