MedCode | BenchmarkList

Metadata

ID: vals_medcode
Category: Healthcare
Release: Unknown
Source: Source page
Snapshot: Snapshot source

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Gemini 3.1 Pro Preview	59.062%	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-28
2	Gemini 3 Flash Preview	55.92%	Gemini 3 Flash Preview google-gemini-3-flash-preview	Imported	2026-05-28
3	Gemini 3.5 Flash	55.825%	Gemini 3.5 Flash google-gemini-3.5-flash	Imported	2026-05-28
4	Claude Opus 4.7	54.858%	Claude Opus 4.7 anthropic-claude-opus-4.7	Imported	2026-05-28
5	Claude Opus 4.8	53.217%	Claude Opus 4.8 anthropic-claude-opus-4.8	Imported	2026-05-28
6	GPT 5.1 2025-11-13	52.732%	GPT-5.1 openai-gpt-5.1	Imported	2026-05-28
7	Gemini 3 Pro Preview	52.198%	Gemini 3 google-gemini-3	Imported	2026-05-28
8	Muse Spark	51.31%	—	Imported	2026-05-28
9	Gemini 2.5 Pro	50.59%	Gemini 2.5 Pro google-gemini-2.5-pro	Imported	2026-05-28
10	GPT 5.2 2025-12-11	49.749%	GPT-5.2 openai-gpt-5.2	Imported	2026-05-28
11	GPT 5.2025-08-07	49.634%	GPT-5 openai-gpt-5	Imported	2026-05-28
12	Claude Opus 4.5 20251101 Thinking	49.156%	Claude Opus 4.5 anthropic-claude-opus-4.5	Imported	2026-05-28
13	Claude Opus 4.6 Thinking	49.129%	Claude Opus 4.6 anthropic-claude-opus-4.6	Imported	2026-05-28
14	GPT 5.5	49.1%	GPT-5.5 openai-gpt-5.5	Imported	2026-05-28
15	Claude Opus 4.6	48.244%	Claude Opus 4.6 anthropic-claude-opus-4.6	Imported	2026-05-28
16	Gemini 3.1 Flash Lite Preview	47.602%	Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview	Imported	2026-05-28
17	O3 2025-04-16	47.29%	o3 openai-o3	Imported	2026-05-28
18	Claude Opus 4.1 20250805 Thinking	47.235%	Claude Opus 4.1 anthropic-claude-opus-4.1	Imported	2026-05-28
19	Claude Opus 4.5 20251101	45.174%	Claude Opus 4.5 anthropic-claude-opus-4.5	Imported	2026-05-28
20	Claude Sonnet 4.5 20250929 Thinking	44.134%	Claude Sonnet 4.5 anthropic-claude-sonnet-4.5	Imported	2026-05-28
21	GPT 5 Mini 2025-08-07	43.045%	GPT-5 Mini openai-gpt-5-mini	Imported	2026-05-28
22	GLM 5.1 Thinking	41.604%	GLM GLM 5.1 z-ai-glm-5.1	Imported	2026-05-28
23	Claude Opus 4.1 20250805	41.372%	Claude Opus 4.1 anthropic-claude-opus-4.1	Imported	2026-05-28
24	GPT 5.4 2026-03-05	41.292%	GPT-5.4 openai-gpt-5.4	Imported	2026-05-28
25	GPT 5.4 Nano 2026-03-17	41.029%	GPT-5.4 Nano openai-gpt-5.4-nano	Imported	2026-05-28
26	Claude Sonnet 4.5 20250929	40.569%	Claude Sonnet 4.5 anthropic-claude-sonnet-4.5	Imported	2026-05-28
27	Gemini 2.5 Flash Preview 09 2025	40.538%	—	Imported	2026-05-28
28	DeepSeek V4 Pro	40.455%	DeepSeek V4 Pro deepseek-deepseek-v4-pro	Imported	2026-05-28
29	Gemini 2.5 Flash Thinking	40.357%	—	Imported	2026-05-28
30	Gemini 2.5 Flash Preview 09 2025 Thinking	40.33%	—	Imported	2026-05-28
31	Kimi K2.6 Thinking	40.142%	KIMI MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6	Imported	2026-05-28
32	Kimi K2.5 Thinking	39.316%	KIMI MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5	Imported	2026-05-28
33	Qwen 3.7 Max	38.751%	Qwen3.7 Max qwen-qwen3.7-max	Imported	2026-05-28
34	Gemini 2.5 Flash	38.425%	Gemini 2.5 Flash google-gemini-2.5-flash	Imported	2026-05-28
35	Grok 4.0709	38.078%	GROK Grok 4 x-ai-grok-4	Imported	2026-05-28
36	Grok 4.3	38.068%	GROK Grok 4.3 x-ai-grok-4.3	Imported	2026-05-28
37	Grok 4 Fast Reasoning	37.385%	GROK Grok 4 Fast x-ai-grok-4-fast	Imported	2026-05-28
38	Qwen 3.6 Plus	36.894%	Qwen3.6 Plus qwen-qwen3.6-plus	Imported	2026-05-28
39	Llama4 Maverick Instruct Basic	36.514%	—	Imported	2026-05-28
40	Claude Sonnet 4.20250514 Thinking	34.959%	—	Imported	2026-05-28
41	MiniMax M2.7	34.44%	MiniMax M2.7 minimax-minimax-m2.7	Imported	2026-05-28
42	Gemini 2.5 Flash Lite Preview 09 2025 Thinking	34.191%	—	Imported	2026-05-28
43	MiniMax M2.1	34.083%	MiniMax M2.1 minimax-minimax-m2.1	Imported	2026-05-28
44	Claude Sonnet 4.20250514	33.943%	Claude Sonnet 4 anthropic-claude-sonnet-4	Imported	2026-05-28
45	O4 Mini 2025-04-16	33.791%	o4 Mini openai-o4-mini	Imported	2026-05-28
46	Mistral Medium 3.5	33.752%	Mistral: Mistral Medium 3.5 mistralai-mistral-medium-3-5	Imported	2026-05-28
47	Qwen 3.5 Flash	32.997%	Qwen3.5-Flash qwen-qwen3.5-flash-02-23	Imported	2026-05-28
48	GLM 4.7	32.772%	GLM GLM 4.7 z-ai-glm-4.7	Imported	2026-05-28
49	Claude Haiku 4.5 20251001 Thinking	32.678%	Claude Haiku 4.5 anthropic-claude-haiku-4.5	Imported	2026-05-28
50	Grok 4.20 0309 Reasoning	32.156%	GROK Grok 4.20 x-ai-grok-4.20	Imported	2026-05-28
51	Qwen 3 Vl Plus 2025-09-23	31.651%	—	Imported	2026-05-28
52	Qwen 3 Max 2026-01-23	31.373%	—	Imported	2026-05-28
53	GPT 5 Nano 2025-08-07	30.441%	GPT-5 Nano openai-gpt-5-nano	Imported	2026-05-28
54	Grok 4 Fast Non Reasoning	30.036%	GROK Grok 4 Fast x-ai-grok-4-fast	Imported	2026-05-28
55	Grok 4.1 Fast Non Reasoning	28.349%	GROK Grok 4.1 Fast x-ai-grok-4.1-fast	Imported	2026-05-28
56	Grok 4.1 Fast Reasoning	28.08%	GROK Grok 4.1 Fast x-ai-grok-4.1-fast	Imported	2026-05-28
57	Gemini 2.5 Flash Lite	27.115%	Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite	Imported	2026-05-28
58	Gemini 2.5 Flash Lite Preview 09 2025	27.079%	Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025	Imported	2026-05-28
59	Llama 4 Scout 17B 16E Instruct	23.311%	Llama 4 Scout meta-llama-llama-4-scout	Imported	2026-05-28
60	Command A Plus 05 2026	19.405%	—	Imported	2026-05-28

Metadata

Metrics

Latest Results