K-MetBench
Expert meteorology benchmark over 1,774 Korean National Meteorological Engineer Examination questions, including reasoning, geo-cultural, text-only, and multimodal subsets.
64rows
accuracy_pctprimary metric
2026-05-28sampled
Metadata
Metrics
Accuracy, Reasoning Score, Geo-Cultural, Text-Only, Multimodal, P1 Weather Analysis and Forecast Theory, P2 Meteorological Observation Methods, P3 Atmospheric Dynamics, P4 Climatology, P5 Atmospheric Physics
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview (Thinking) | 93.7% accuracy | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 2 | gpt-5.2 (Thinking) | 87.8% accuracy | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 3 | Qwen3-VL-235B-A22B-Thinking | 84.4% accuracy | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Imported | 2026-05-28 |
| 4 | Qwen3.5-27B (Thinking) | 83.0% accuracy | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-28 |
| 5 | Qwen3.6-35B-A3B (Thinking) | 82.9% accuracy | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-28 |
| 6 | Qwen3.5-35B-A3B (Thinking) | 81.8% accuracy | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-28 |
| 7 | Qwen3-VL-32B-Thinking | 78.6% accuracy | — | Imported | 2026-05-28 |
| 8 | command-a-reasoning-08-2025 | 77.8% accuracy | — | Imported | 2026-05-28 |
| 9 | gpt-5.2 | 77.6% accuracy | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 10 | gpt-oss-120b | 77.3% accuracy | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 11 | Qwen3-30B-A3B-Thinking-2507 | 76.7% accuracy | Qwen3 30B A3B Thinking 2507 qwen-qwen3-30b-a3b-thinking-2507 | Imported | 2026-05-28 |
| 12 | A.X-4.0 | 76.1% accuracy | — | Imported | 2026-05-28 |
| 13 | EXAONE-4.5-33B | 75.9% accuracy | — | Imported | 2026-05-28 |
| 14 | Qwen3-VL-30B-A3B-Thinking | 74.9% accuracy | Qwen3 VL 30B A3B Thinking qwen-qwen3-vl-30b-a3b-thinking | Imported | 2026-05-28 |
| 15 | Qwen3.5-9B (Thinking) | 74.9% accuracy | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-28 |
| 16 | Qwen3-14B | 73.7% accuracy | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-28 |
| 17 | Qwen3.5-27B (Non-Thinking) | 73.4% accuracy | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-28 |
| 18 | Qwen3-VL-235B-A22B-Instruct | 72.4% accuracy | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Imported | 2026-05-28 |
| 19 | Qwen3-VL-8B-Thinking | 71.7% accuracy | Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking | Imported | 2026-05-28 |
| 20 | gpt-oss-20b | 71.5% accuracy | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-28 |
| 21 | Qwen3-8B | 70.1% accuracy | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-28 |
| 22 | Qwen3.5-35B-A3B (Non-Thinking) | 69.3% accuracy | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-28 |
| 23 | Qwen3.6-35B-A3B (Non-Thinking) | 68.8% accuracy | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-28 |
| 24 | Qwen3-4B-Thinking-2507 | 67.8% accuracy | — | Imported | 2026-05-28 |
| 25 | Qwen3-VL-32B-Instruct | 67.5% accuracy | Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct | Imported | 2026-05-28 |
| 26 | Qwen2.5-VL-72B-Instruct | 67.1% accuracy | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-28 |
| 27 | Qwen3-VL-4B-Thinking | 66.1% accuracy | — | Imported | 2026-05-28 |
| 28 | c4ai-command-a-03-2025 | 65.5% accuracy | — | Imported | 2026-05-28 |
| 29 | Qwen3-30B-A3B-Instruct-2507 | 64.7% accuracy | Qwen3 30B A3B Instruct 2507 qwen-qwen3-30b-a3b-instruct-2507 | Imported | 2026-05-28 |
| 30 | Qwen3-VL-30B-A3B-Instruct | 62.2% accuracy | Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct | Imported | 2026-05-28 |
| 31 | Qwen3.5-9B (Non-Thinking) | 60.4% accuracy | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-28 |
| 32 | Qwen2.5-VL-32B-Instruct | 60.1% accuracy | — | Imported | 2026-05-28 |
| 33 | Llama-3.1-70B-Instruct | 59.9% accuracy | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Imported | 2026-05-28 |
| 34 | EXAONE-4.0-32B | 59.9% accuracy | — | Imported | 2026-05-28 |
| 35 | VARCO-Vision-2.0-14B | 58.7% accuracy | — | Imported | 2026-05-28 |
| 36 | InternVL3.5-38B-Instruct | 57.3% accuracy | — | Imported | 2026-05-28 |
| 37 | Llama-3.2-90B-Vision-Instruct | 56.9% accuracy | — | Imported | 2026-05-28 |
| 38 | A.X-4.0-Light | 55.7% accuracy | — | Imported | 2026-05-28 |
| 39 | Qwen3-VL-8B-Instruct | 53.8% accuracy | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Imported | 2026-05-28 |
| 40 | A.X-4.0-VL-Light | 52.5% accuracy | — | Imported | 2026-05-28 |
| 41 | Qwen3-4B-Instruct-2507 | 51.5% accuracy | — | Imported | 2026-05-28 |
| 42 | Phi-4 | 51.5% accuracy | Phi 4 microsoft-phi-4 | Imported | 2026-05-28 |
| 43 | Qwen3-VL-4B-Instruct | 51.0% accuracy | — | Imported | 2026-05-28 |
| 44 | HyperCLOVAX-SEED-Think-14B | 50.8% accuracy | — | Imported | 2026-05-28 |
| 45 | InternVL3.5-14B-Instruct | 47.9% accuracy | — | Imported | 2026-05-28 |
| 46 | Qwen3-32B | 47.5% accuracy | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-28 |
| 47 | Qwen3-1.7B | 46.8% accuracy | — | Imported | 2026-05-28 |
| 48 | InternVL3.5-8B-Instruct | 46.1% accuracy | — | Imported | 2026-05-28 |
| 49 | Qwen2.5-VL-7B-Instruct | 46.1% accuracy | — | Imported | 2026-05-28 |
| 50 | Llama-3.1-8B-Instruct | 41.8% accuracy | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-28 |
| 51 | InternVL3.5-4B-Instruct | 41.5% accuracy | — | Imported | 2026-05-28 |
| 52 | Qwen2.5-VL-3B-Instruct | 40.9% accuracy | — | Imported | 2026-05-28 |
| 53 | EXAONE-4.0-1.2B | 37.4% accuracy | — | Imported | 2026-05-28 |
| 54 | VARCO-Vision-2.0-1.7B | 35.2% accuracy | — | Imported | 2026-05-28 |
| 55 | Llama-3.2-3B-Instruct | 33.8% accuracy | Llama 3.2 3B Instruct meta-llama-llama-3.2-3b-instruct | Imported | 2026-05-28 |
| 56 | Qwen3-0.6B | 32.2% accuracy | — | Imported | 2026-05-28 |
| 57 | HyperCLOVAX-SEED-Vision-Instruct-3B | 32.0% accuracy | — | Imported | 2026-05-28 |
| 58 | InternVL3.5-2B-Instruct | 31.0% accuracy | — | Imported | 2026-05-28 |
| 59 | HyperCLOVAX-SEED-Text-Instruct-1.5B | 30.6% accuracy | — | Imported | 2026-05-28 |
| 60 | Phi-4-mini-instruct | 30.4% accuracy | — | Imported | 2026-05-28 |
| 61 | InternVL3.5-1B-Instruct | 23.8% accuracy | — | Imported | 2026-05-28 |
| 62 | HyperCLOVAX-SEED-Text-Instruct-0.5B | 13.2% accuracy | — | Imported | 2026-05-28 |
| 63 | Phi-4-mini-reasoning | 12.6% accuracy | — | Imported | 2026-05-28 |
| 64 | Llama-3.2-1B-Instruct | 3.5% accuracy | Llama 3.2 1B Instruct meta-llama-llama-3.2-1b-instruct | Imported | 2026-05-28 |
No matching rows.