GeoRC

Geolocation reasoning benchmark for vision-language models, measuring location inference, F1, and country-level accuracy.

15rows
f1primary metric
2026-05-27sampled

Metadata

Metrics

F1, Precision, Recall, Country Accuracy

Latest Results

Rows parsed from the public GeoRC leaderboard JSON. F1 is used when present; rows that only publish country accuracy retain that metric as the row score.

Rank Subject F1 Model Match Provenance Sampled
1 Human Expert #3 (Best Expert) 97.33 Imported 2026-05-27
2 Human Expert #1 96.67 Imported 2026-05-27
3 Human Expert #2 90 Imported 2026-05-27
4 Human Expert Average 53.92 Imported 2026-05-27
5 GPT-4.1 42.3 GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
6 Gemini-2.5-Pro 41.51 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27
7 Gemini-2.5-Flash 41.3 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
8 Gemini-3-Pro 40.98 Gemini 3
google-gemini-3
Imported 2026-05-27
9 GPT-5 40.56 GPT-5
openai-gpt-5
Imported 2026-05-27
10 Qwen2.5-VL-7B-Instruct 31.63 Imported 2026-05-27
11 Gemma-3-12b-it 31.21 Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-27
12 Llama-3.2-11B-Vision-Instruct 25.86 Llama 3.2 11B Vision Instruct
meta-llama-llama-3.2-11b-vision-instruct
Imported 2026-05-27
13 Qwen3-VL-8B-Instruct 23.81 Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-27
14 Hallucinated 18.48 Imported 2026-05-27
15 Random Hallucinated 2.42 Imported 2026-05-27