MedCode

Can models support the medical billing process?

60rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Gemini 3.1 Pro Preview 59.062% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
2 Gemini 3 Flash Preview 55.92% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
3 Gemini 3.5 Flash 55.825% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
4 Claude Opus 4.7 54.858% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
5 Claude Opus 4.8 53.217% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
6 GPT 5.1 2025-11-13 52.732% GPT-5.1
openai-gpt-5.1
Imported 2026-05-28
7 Gemini 3 Pro Preview 52.198% Gemini 3
google-gemini-3
Imported 2026-05-28
8 Muse Spark 51.31% Imported 2026-05-28
9 Gemini 2.5 Pro 50.59% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
10 GPT 5.2 2025-12-11 49.749% GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
11 GPT 5.2025-08-07 49.634% GPT-5
openai-gpt-5
Imported 2026-05-28
12 Claude Opus 4.5 20251101 Thinking 49.156% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
13 Claude Opus 4.6 Thinking 49.129% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
14 GPT 5.5 49.1% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
15 Claude Opus 4.6 48.244% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
16 Gemini 3.1 Flash Lite Preview 47.602% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
17 O3 2025-04-16 47.29% o3
openai-o3
Imported 2026-05-28
18 Claude Opus 4.1 20250805 Thinking 47.235% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
19 Claude Opus 4.5 20251101 45.174% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
20 Claude Sonnet 4.5 20250929 Thinking 44.134% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
21 GPT 5 Mini 2025-08-07 43.045% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
22 GLM 5.1 Thinking 41.604% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-28
23 Claude Opus 4.1 20250805 41.372% Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
24 GPT 5.4 2026-03-05 41.292% GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
25 GPT 5.4 Nano 2026-03-17 41.029% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
26 Claude Sonnet 4.5 20250929 40.569% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
27 Gemini 2.5 Flash Preview 09 2025 40.538% Imported 2026-05-28
28 DeepSeek V4 Pro 40.455% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-28
29 Gemini 2.5 Flash Thinking 40.357% Imported 2026-05-28
30 Gemini 2.5 Flash Preview 09 2025 Thinking 40.33% Imported 2026-05-28
31 Kimi K2.6 Thinking 40.142% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
32 Kimi K2.5 Thinking 39.316% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-28
33 Qwen 3.7 Max 38.751% Qwen3.7 Max
qwen-qwen3.7-max
Imported 2026-05-28
34 Gemini 2.5 Flash 38.425% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-28
35 Grok 4.0709 38.078% GROK Grok 4
x-ai-grok-4
Imported 2026-05-28
36 Grok 4.3 38.068% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
37 Grok 4 Fast Reasoning 37.385% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
38 Qwen 3.6 Plus 36.894% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
39 Llama4 Maverick Instruct Basic 36.514% Imported 2026-05-28
40 Claude Sonnet 4.20250514 Thinking 34.959% Imported 2026-05-28
41 MiniMax M2.7 34.44% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-28
42 Gemini 2.5 Flash Lite Preview 09 2025 Thinking 34.191% Imported 2026-05-28
43 MiniMax M2.1 34.083% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-28
44 Claude Sonnet 4.20250514 33.943% Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-28
45 O4 Mini 2025-04-16 33.791% o4 Mini
openai-o4-mini
Imported 2026-05-28
46 Mistral Medium 3.5 33.752% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-28
47 Qwen 3.5 Flash 32.997% Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-28
48 GLM 4.7 32.772% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-28
49 Claude Haiku 4.5 20251001 Thinking 32.678% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
50 Grok 4.20 0309 Reasoning 32.156% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
51 Qwen 3 Vl Plus 2025-09-23 31.651% Imported 2026-05-28
52 Qwen 3 Max 2026-01-23 31.373% Imported 2026-05-28
53 GPT 5 Nano 2025-08-07 30.441% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-28
54 Grok 4 Fast Non Reasoning 30.036% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
55 Grok 4.1 Fast Non Reasoning 28.349% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
56 Grok 4.1 Fast Reasoning 28.08% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
57 Gemini 2.5 Flash Lite 27.115% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-28
58 Gemini 2.5 Flash Lite Preview 09 2025 27.079% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-28
59 Llama 4 Scout 17B 16E Instruct 23.311% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-28
60 Command A Plus 05 2026 19.405% Imported 2026-05-28