K-MetBench

Expert meteorology benchmark over 1,774 Korean National Meteorological Engineer Examination questions, including reasoning, geo-cultural, text-only, and multimodal subsets.

64rows
accuracy_pctprimary metric
2026-05-28sampled

Metadata

Metrics

Accuracy, Reasoning Score, Geo-Cultural, Text-Only, Multimodal, P1 Weather Analysis and Forecast Theory, P2 Meteorological Observation Methods, P3 Atmospheric Dynamics, P4 Climatology, P5 Atmospheric Physics

Latest Results

Rows are imported from the official K-MetBench static leaderboard JSON used by the public site and ranked by overall accuracy.

Rank Subject Accuracy Model Match Provenance Sampled
1 gemini-3-pro-preview (Thinking) 93.7% accuracy Gemini 3
google-gemini-3
Imported 2026-05-28
2 gpt-5.2 (Thinking) 87.8% accuracy GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
3 Qwen3-VL-235B-A22B-Thinking 84.4% accuracy Qwen3 VL 235B A22B Thinking
qwen-qwen3-vl-235b-a22b-thinking
Imported 2026-05-28
4 Qwen3.5-27B (Thinking) 83.0% accuracy Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-28
5 Qwen3.6-35B-A3B (Thinking) 82.9% accuracy Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-28
6 Qwen3.5-35B-A3B (Thinking) 81.8% accuracy Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-28
7 Qwen3-VL-32B-Thinking 78.6% accuracy Imported 2026-05-28
8 command-a-reasoning-08-2025 77.8% accuracy Imported 2026-05-28
9 gpt-5.2 77.6% accuracy GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
10 gpt-oss-120b 77.3% accuracy gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-28
11 Qwen3-30B-A3B-Thinking-2507 76.7% accuracy Qwen3 30B A3B Thinking 2507
qwen-qwen3-30b-a3b-thinking-2507
Imported 2026-05-28
12 A.X-4.0 76.1% accuracy Imported 2026-05-28
13 EXAONE-4.5-33B 75.9% accuracy Imported 2026-05-28
14 Qwen3-VL-30B-A3B-Thinking 74.9% accuracy Qwen3 VL 30B A3B Thinking
qwen-qwen3-vl-30b-a3b-thinking
Imported 2026-05-28
15 Qwen3.5-9B (Thinking) 74.9% accuracy Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-28
16 Qwen3-14B 73.7% accuracy Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-28
17 Qwen3.5-27B (Non-Thinking) 73.4% accuracy Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-28
18 Qwen3-VL-235B-A22B-Instruct 72.4% accuracy Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-28
19 Qwen3-VL-8B-Thinking 71.7% accuracy Qwen3 VL 8B Thinking
qwen-qwen3-vl-8b-thinking
Imported 2026-05-28
20 gpt-oss-20b 71.5% accuracy gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-28
21 Qwen3-8B 70.1% accuracy Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-28
22 Qwen3.5-35B-A3B (Non-Thinking) 69.3% accuracy Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-28
23 Qwen3.6-35B-A3B (Non-Thinking) 68.8% accuracy Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-28
24 Qwen3-4B-Thinking-2507 67.8% accuracy Imported 2026-05-28
25 Qwen3-VL-32B-Instruct 67.5% accuracy Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-28
26 Qwen2.5-VL-72B-Instruct 67.1% accuracy Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-28
27 Qwen3-VL-4B-Thinking 66.1% accuracy Imported 2026-05-28
28 c4ai-command-a-03-2025 65.5% accuracy Imported 2026-05-28
29 Qwen3-30B-A3B-Instruct-2507 64.7% accuracy Qwen3 30B A3B Instruct 2507
qwen-qwen3-30b-a3b-instruct-2507
Imported 2026-05-28
30 Qwen3-VL-30B-A3B-Instruct 62.2% accuracy Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-28
31 Qwen3.5-9B (Non-Thinking) 60.4% accuracy Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-28
32 Qwen2.5-VL-32B-Instruct 60.1% accuracy Imported 2026-05-28
33 Llama-3.1-70B-Instruct 59.9% accuracy Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-28
34 EXAONE-4.0-32B 59.9% accuracy Imported 2026-05-28
35 VARCO-Vision-2.0-14B 58.7% accuracy Imported 2026-05-28
36 InternVL3.5-38B-Instruct 57.3% accuracy Imported 2026-05-28
37 Llama-3.2-90B-Vision-Instruct 56.9% accuracy Imported 2026-05-28
38 A.X-4.0-Light 55.7% accuracy Imported 2026-05-28
39 Qwen3-VL-8B-Instruct 53.8% accuracy Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-28
40 A.X-4.0-VL-Light 52.5% accuracy Imported 2026-05-28
41 Qwen3-4B-Instruct-2507 51.5% accuracy Imported 2026-05-28
42 Phi-4 51.5% accuracy Phi 4
microsoft-phi-4
Imported 2026-05-28
43 Qwen3-VL-4B-Instruct 51.0% accuracy Imported 2026-05-28
44 HyperCLOVAX-SEED-Think-14B 50.8% accuracy Imported 2026-05-28
45 InternVL3.5-14B-Instruct 47.9% accuracy Imported 2026-05-28
46 Qwen3-32B 47.5% accuracy Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-28
47 Qwen3-1.7B 46.8% accuracy Imported 2026-05-28
48 InternVL3.5-8B-Instruct 46.1% accuracy Imported 2026-05-28
49 Qwen2.5-VL-7B-Instruct 46.1% accuracy Imported 2026-05-28
50 Llama-3.1-8B-Instruct 41.8% accuracy Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-28
51 InternVL3.5-4B-Instruct 41.5% accuracy Imported 2026-05-28
52 Qwen2.5-VL-3B-Instruct 40.9% accuracy Imported 2026-05-28
53 EXAONE-4.0-1.2B 37.4% accuracy Imported 2026-05-28
54 VARCO-Vision-2.0-1.7B 35.2% accuracy Imported 2026-05-28
55 Llama-3.2-3B-Instruct 33.8% accuracy Llama 3.2 3B Instruct
meta-llama-llama-3.2-3b-instruct
Imported 2026-05-28
56 Qwen3-0.6B 32.2% accuracy Imported 2026-05-28
57 HyperCLOVAX-SEED-Vision-Instruct-3B 32.0% accuracy Imported 2026-05-28
58 InternVL3.5-2B-Instruct 31.0% accuracy Imported 2026-05-28
59 HyperCLOVAX-SEED-Text-Instruct-1.5B 30.6% accuracy Imported 2026-05-28
60 Phi-4-mini-instruct 30.4% accuracy Imported 2026-05-28
61 InternVL3.5-1B-Instruct 23.8% accuracy Imported 2026-05-28
62 HyperCLOVAX-SEED-Text-Instruct-0.5B 13.2% accuracy Imported 2026-05-28
63 Phi-4-mini-reasoning 12.6% accuracy Imported 2026-05-28
64 Llama-3.2-1B-Instruct 3.5% accuracy Llama 3.2 1B Instruct
meta-llama-llama-3.2-1b-instruct
Imported 2026-05-28