CHOICE

Remote-sensing vision-language benchmark for perception and reasoning over Earth-observation imagery across multiple task dimensions.

24rows
overall_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Overall Score, Perception Score, Reasoning Score, ILC, SII, CID, AttR, AssR, CSR

Latest Results

Rows parsed from the CHOICE project page embedded leaderboardData object. Overall score is the mean of the six published L2 capability scores.

Rank Subject Overall Score Model Match Provenance Sampled
1 Qwen2-VL-72B 0.7378 Imported 2026-05-27
2 InternVL2-40B 0.7263 Imported 2026-05-27
3 Ovis1.6-Gemma2-9B 0.6998 Imported 2026-05-27
4 Gemini-1.5-Pro 0.6942 Imported 2026-05-27
5 Qwen2-VL-7B 0.6922 Imported 2026-05-27
6 InternVL2-26B 0.6858 Imported 2026-05-27
7 InternVL2-8B 0.6772 Imported 2026-05-27
8 GLM-4V-9B 0.6435 Imported 2026-05-27
9 GPT-4o-2024-11-20 0.6275 GPT-4o
openai-gpt-4o
Imported 2026-05-27
10 Molmo-7B-D 0.6215 Imported 2026-05-27
11 MiniCPM-V-2.5 0.6163 Imported 2026-05-27
12 GPT-4o-mini 0.6133 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
13 DeepSeek-VL-7B 0.6102 Imported 2026-05-27
14 LLaVA-1.6-13B 0.5907 Imported 2026-05-27
15 Llama3.2-11B 0.5672 Imported 2026-05-27
16 mPLUG-Owl3-7B 0.5648 Imported 2026-05-27
17 VHM 0.5623 Imported 2026-05-27
18 LLaVA-1.6-7B 0.5592 Imported 2026-05-27
19 Phi3-Vision 0.5293 Imported 2026-05-27
20 GeoChat 0.5067 Imported 2026-05-27
21 LHRS-Bot-nova 0.4930 Imported 2026-05-27
22 RemoteCLIP 0.4868 Imported 2026-05-27
23 GeoRSCLIP 0.4532 Imported 2026-05-27
24 LHRS-Bot 0.4160 Imported 2026-05-27