LiveMedBench
Live medical benchmark with time-stamped real-world cases and after-cutoff scoring for measuring medical model robustness over time.
38rows
overall_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Overall Score, After-Cutoff Score
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.2 | 0.3923 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-27 |
| 2 | GPT-5.1 | 0.3845 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-27 |
| 3 | GPT-5 | 0.2858 | GPT-5 openai-gpt-5 | Imported | 2026-05-27 |
| 4 | Grok-4.1 | 0.2828 | — | Imported | 2026-05-27 |
| 5 | Baichuan-M3 | 0.2561 | — | Imported | 2026-05-27 |
| 6 | GPT-OSS 120B | 0.2503 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-27 |
| 7 | GLM-4.5 | 0.2246 | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-27 |
| 8 | Gemini 3 Flash | 0.2167 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-27 |
| 9 | Gemini 3 Pro | 0.1829 | Gemini 3 google-gemini-3 | Imported | 2026-05-27 |
| 10 | GLM-4.6 | 0.1759 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-27 |
| 11 | Claude 3.7 Sonnet | 0.1699 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 12 | Gemini 2.5 Pro | 0.1606 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-27 |
| 13 | Qwen3-14B | 0.1545 | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-27 |
| 14 | GPT-4.1 | 0.1379 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 15 | QwQ-32B | 0.135 | — | Imported | 2026-05-27 |
| 16 | GLM-4.7 Thinking | 0.1335 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-27 |
| 17 | DeepSeek-R1 | 0.1329 | R1 deepseek-r1 | Imported | 2026-05-27 |
| 18 | Qwen2.5-72B-Ins | 0.1276 | — | Imported | 2026-05-27 |
| 19 | GLM-4.5 Air | 0.1105 | GLM 4.5 Air z-ai-glm-4.5-air | Imported | 2026-05-27 |
| 20 | Baichuan-M2 | 0.1078 | — | Imported | 2026-05-27 |
| 21 | GPT-4.1 Mini | 0.1036 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-27 |
| 22 | DeepSeek-V3.2 | 0.1028 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-27 |
| 23 | Claude 4 Sonnet | 0.1013 | — | Imported | 2026-05-27 |
| 24 | DeepSeek-V3.1 | 0.0959 | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-27 |
| 25 | HuatuoGPT-o1 | 0.0712 | — | Imported | 2026-05-27 |
| 26 | Qwen2.5-32B-Ins | 0.0641 | — | Imported | 2026-05-27 |
| 27 | Gemini 2.5 Flash | 0.064 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-27 |
| 28 | Med-Gemma 27B | 0.059 | — | Imported | 2026-05-27 |
| 29 | Kimi K2 | 0.0585 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-27 |
| 30 | Lingshu-32B | 0.0577 | — | Imported | 2026-05-27 |
| 31 | Qwen3-30B | 0.0559 | — | Imported | 2026-05-27 |
| 32 | Med-Gemma 1.5 | 0.0537 | — | Imported | 2026-05-27 |
| 33 | GLM-4 | 0.0522 | — | Imported | 2026-05-27 |
| 34 | GPT-4o | 0.0506 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 35 | Qwen3-235B | 0.0505 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-27 |
| 36 | Lingshu-7B | 0.0377 | — | Imported | 2026-05-27 |
| 37 | Gemini 2.0 Flash | 0.0271 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-27 |
| 38 | Med-Gemma 4B | 0 | — | Imported | 2026-05-27 |
No matching rows.