GeoRC
Geolocation reasoning benchmark for vision-language models, measuring location inference, F1, and country-level accuracy.
15rows
f1primary metric
2026-05-27sampled
Metadata
Metrics
F1, Precision, Recall, Country Accuracy
| Rank | Subject | F1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Expert #3 (Best Expert) | 97.33 | — | Imported | 2026-05-27 |
| 2 | Human Expert #1 | 96.67 | — | Imported | 2026-05-27 |
| 3 | Human Expert #2 | 90 | — | Imported | 2026-05-27 |
| 4 | Human Expert Average | 53.92 | — | Imported | 2026-05-27 |
| 5 | GPT-4.1 | 42.3 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 6 | Gemini-2.5-Pro | 41.51 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-27 |
| 7 | Gemini-2.5-Flash | 41.3 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-27 |
| 8 | Gemini-3-Pro | 40.98 | Gemini 3 google-gemini-3 | Imported | 2026-05-27 |
| 9 | GPT-5 | 40.56 | GPT-5 openai-gpt-5 | Imported | 2026-05-27 |
| 10 | Qwen2.5-VL-7B-Instruct | 31.63 | — | Imported | 2026-05-27 |
| 11 | Gemma-3-12b-it | 31.21 | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-27 |
| 12 | Llama-3.2-11B-Vision-Instruct | 25.86 | Llama 3.2 11B Vision Instruct meta-llama-llama-3.2-11b-vision-instruct | Imported | 2026-05-27 |
| 13 | Qwen3-VL-8B-Instruct | 23.81 | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Imported | 2026-05-27 |
| 14 | Hallucinated | 18.48 | — | Imported | 2026-05-27 |
| 15 | Random Hallucinated | 2.42 | — | Imported | 2026-05-27 |
No matching rows.