LiveMedBench

Live medical benchmark with time-stamped real-world cases and after-cutoff scoring for measuring medical model robustness over time.

38rows
overall_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Overall Score, After-Cutoff Score

Latest Results

Rows parsed from the LiveMedBench public leaderboard JSON. Source ratios are converted to percentage points to match the website display.

Rank Subject Overall Score Model Match Provenance Sampled
1 GPT-5.2 0.3923 GPT-5.2
openai-gpt-5.2
Imported 2026-05-27
2 GPT-5.1 0.3845 GPT-5.1
openai-gpt-5.1
Imported 2026-05-27
3 GPT-5 0.2858 GPT-5
openai-gpt-5
Imported 2026-05-27
4 Grok-4.1 0.2828 Imported 2026-05-27
5 Baichuan-M3 0.2561 Imported 2026-05-27
6 GPT-OSS 120B 0.2503 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-27
7 GLM-4.5 0.2246 GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-27
8 Gemini 3 Flash 0.2167 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-27
9 Gemini 3 Pro 0.1829 Gemini 3
google-gemini-3
Imported 2026-05-27
10 GLM-4.6 0.1759 GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-27
11 Claude 3.7 Sonnet 0.1699 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
12 Gemini 2.5 Pro 0.1606 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27
13 Qwen3-14B 0.1545 Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-27
14 GPT-4.1 0.1379 GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
15 QwQ-32B 0.135 Imported 2026-05-27
16 GLM-4.7 Thinking 0.1335 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-27
17 DeepSeek-R1 0.1329 R1
deepseek-r1
Imported 2026-05-27
18 Qwen2.5-72B-Ins 0.1276 Imported 2026-05-27
19 GLM-4.5 Air 0.1105 GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-27
20 Baichuan-M2 0.1078 Imported 2026-05-27
21 GPT-4.1 Mini 0.1036 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-27
22 DeepSeek-V3.2 0.1028 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-27
23 Claude 4 Sonnet 0.1013 Imported 2026-05-27
24 DeepSeek-V3.1 0.0959 DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-27
25 HuatuoGPT-o1 0.0712 Imported 2026-05-27
26 Qwen2.5-32B-Ins 0.0641 Imported 2026-05-27
27 Gemini 2.5 Flash 0.064 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
28 Med-Gemma 27B 0.059 Imported 2026-05-27
29 Kimi K2 0.0585 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-27
30 Lingshu-32B 0.0577 Imported 2026-05-27
31 Qwen3-30B 0.0559 Imported 2026-05-27
32 Med-Gemma 1.5 0.0537 Imported 2026-05-27
33 GLM-4 0.0522 Imported 2026-05-27
34 GPT-4o 0.0506 GPT-4o
openai-gpt-4o
Imported 2026-05-27
35 Qwen3-235B 0.0505 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-27
36 Lingshu-7B 0.0377 Imported 2026-05-27
37 Gemini 2.0 Flash 0.0271 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-27
38 Med-Gemma 4B 0 Imported 2026-05-27