MedScribe

Can models support doctors with their administrative work?

60rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 GPT 5.1 2025-11-13 88.09% GPT-5.1
openai-gpt-5.1
Imported 2026-05-28
2 GPT 5.5 86.868% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
3 Claude Opus 4.6 86.738% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
4 Claude Opus 4.6 Thinking 86.13% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
5 Muse Spark 85.902% Imported 2026-05-28
6 Claude Opus 4.8 85.755% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
7 Claude Opus 4.5 20251101 Thinking 85.321% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
8 Claude Haiku 4.5 20251001 Thinking 85.23% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
9 Claude Sonnet 4.5 20250929 84.515% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
10 GPT 5.2 2025-12-11 84.387% GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
11 Claude Sonnet 4.5 20250929 Thinking 84.101% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
12 GPT 5.2025-08-07 83.65% GPT-5
openai-gpt-5
Imported 2026-05-28
13 Claude Opus 4.5 20251101 83.246% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
14 Gemini 2.5 Flash Thinking 82.983% Imported 2026-05-28
15 Claude Opus 4.7 82.953% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
16 Gemini 2.5 Flash 82.869% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-28
17 Grok 4 Fast Reasoning 81.632% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
18 MiniMax M2.1 80.777% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-28
19 GPT 5 Mini 2025-08-07 80.577% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
20 MiniMax M2.7 79.867% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-28
21 Grok 4 Fast Non Reasoning 79.722% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
22 Qwen 3.7 Max 79.396% Qwen3.7 Max
qwen-qwen3.7-max
Imported 2026-05-28
23 Grok 4.1 Fast Reasoning 78.732% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
24 Gemini 2.5 Flash Preview 09 2025 Thinking 78.497% Imported 2026-05-28
25 Grok 4.0709 78.152% GROK Grok 4
x-ai-grok-4
Imported 2026-05-28
26 Kimi K2.6 Thinking 78.149% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
27 Gemini 2.5 Flash Preview 09 2025 77.946% Imported 2026-05-28
28 GPT 5.4 2026-03-05 77.549% GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
29 Grok 4.1 Fast Non Reasoning 77.464% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
30 Qwen 3 Vl Plus 2025-09-23 77.129% Imported 2026-05-28
31 GPT 5.4 Nano 2026-03-17 77.09% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
32 Qwen 3.6 Plus 76.963% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
33 O3 2025-04-16 76.654% o3
openai-o3
Imported 2026-05-28
34 Gemini 3.5 Flash 76.574% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
35 Kimi K2.5 Thinking 76.442% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-28
36 Gemini 3.1 Pro Preview 76.114% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
37 Gemini 2.5 Flash Lite Preview 09 2025 75.824% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-28
38 DeepSeek V4 Pro 75.144% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-28
39 Grok 4.3 74.399% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
40 Claude Opus 4.1 20250805 Thinking 73.901% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
41 Gemini 2.5 Pro 73.552% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
42 GPT 5 Nano 2025-08-07 72.865% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-28
43 Gemini 2.5 Flash Lite 72.832% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-28
44 Qwen 3 Max 2026-01-23 72.709% Imported 2026-05-28
45 Claude Sonnet 4.20250514 72.411% Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-28
46 GLM 5.1 Thinking 72.27% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-28
47 Gemini 3 Pro Preview 72.036% Gemini 3
google-gemini-3
Imported 2026-05-28
48 Claude Opus 4.1 20250805 71.753% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
49 Qwen 3.5 Flash 70.619% Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-28
50 Gemini 3 Flash Preview 69.917% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
51 Claude Sonnet 4.20250514 Thinking 69.353% Imported 2026-05-28
52 O4 Mini 2025-04-16 69.139% o4 Mini
openai-o4-mini
Imported 2026-05-28
53 GLM 4.7 68.629% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-28
54 Mistral Medium 3.5 67.728% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-28
55 Gemini 2.5 Flash Lite Preview 09 2025 Thinking 66.877% Imported 2026-05-28
56 Gemini 3.1 Flash Lite Preview 63.902% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
57 Grok 4.20 0309 Reasoning 63.412% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
58 Command A Plus 05 2026 55.682% Imported 2026-05-28
59 Llama4 Maverick Instruct Basic 54.219% Imported 2026-05-28
60 Llama 4 Scout 17B 16E Instruct 50.593% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-28