MedCode
Can models support the medical billing process?
60rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 59.062% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 2 | Gemini 3 Flash Preview | 55.92% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 3 | Gemini 3.5 Flash | 55.825% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-28 |
| 4 | Claude Opus 4.7 | 54.858% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 5 | Claude Opus 4.8 | 53.217% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Imported | 2026-05-28 |
| 6 | GPT 5.1 2025-11-13 | 52.732% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-28 |
| 7 | Gemini 3 Pro Preview | 52.198% | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 8 | Muse Spark | 51.31% | — | Imported | 2026-05-28 |
| 9 | Gemini 2.5 Pro | 50.59% | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 10 | GPT 5.2 2025-12-11 | 49.749% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 11 | GPT 5.2025-08-07 | 49.634% | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 12 | Claude Opus 4.5 20251101 Thinking | 49.156% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 13 | Claude Opus 4.6 Thinking | 49.129% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 14 | GPT 5.5 | 49.1% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 15 | Claude Opus 4.6 | 48.244% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 16 | Gemini 3.1 Flash Lite Preview | 47.602% | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-28 |
| 17 | O3 2025-04-16 | 47.29% | o3 openai-o3 | Imported | 2026-05-28 |
| 18 | Claude Opus 4.1 20250805 Thinking | 47.235% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 19 | Claude Opus 4.5 20251101 | 45.174% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 20 | Claude Sonnet 4.5 20250929 Thinking | 44.134% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 21 | GPT 5 Mini 2025-08-07 | 43.045% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 22 | GLM 5.1 Thinking | 41.604% | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-28 |
| 23 | Claude Opus 4.1 20250805 | 41.372% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 24 | GPT 5.4 2026-03-05 | 41.292% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 25 | GPT 5.4 Nano 2026-03-17 | 41.029% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
| 26 | Claude Sonnet 4.5 20250929 | 40.569% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 27 | Gemini 2.5 Flash Preview 09 2025 | 40.538% | — | Imported | 2026-05-28 |
| 28 | DeepSeek V4 Pro | 40.455% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-28 |
| 29 | Gemini 2.5 Flash Thinking | 40.357% | — | Imported | 2026-05-28 |
| 30 | Gemini 2.5 Flash Preview 09 2025 Thinking | 40.33% | — | Imported | 2026-05-28 |
| 31 | Kimi K2.6 Thinking | 40.142% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 32 | Kimi K2.5 Thinking | 39.316% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 33 | Qwen 3.7 Max | 38.751% | Qwen3.7 Max qwen-qwen3.7-max | Imported | 2026-05-28 |
| 34 | Gemini 2.5 Flash | 38.425% | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 35 | Grok 4.0709 | 38.078% | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 36 | Grok 4.3 | 38.068% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-28 |
| 37 | Grok 4 Fast Reasoning | 37.385% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 38 | Qwen 3.6 Plus | 36.894% | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 39 | Llama4 Maverick Instruct Basic | 36.514% | — | Imported | 2026-05-28 |
| 40 | Claude Sonnet 4.20250514 Thinking | 34.959% | — | Imported | 2026-05-28 |
| 41 | MiniMax M2.7 | 34.44% | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-28 |
| 42 | Gemini 2.5 Flash Lite Preview 09 2025 Thinking | 34.191% | — | Imported | 2026-05-28 |
| 43 | MiniMax M2.1 | 34.083% | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-28 |
| 44 | Claude Sonnet 4.20250514 | 33.943% | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-28 |
| 45 | O4 Mini 2025-04-16 | 33.791% | o4 Mini openai-o4-mini | Imported | 2026-05-28 |
| 46 | Mistral Medium 3.5 | 33.752% | Mistral: Mistral Medium 3.5 mistralai-mistral-medium-3-5 | Imported | 2026-05-28 |
| 47 | Qwen 3.5 Flash | 32.997% | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-28 |
| 48 | GLM 4.7 | 32.772% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-28 |
| 49 | Claude Haiku 4.5 20251001 Thinking | 32.678% | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 50 | Grok 4.20 0309 Reasoning | 32.156% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-28 |
| 51 | Qwen 3 Vl Plus 2025-09-23 | 31.651% | — | Imported | 2026-05-28 |
| 52 | Qwen 3 Max 2026-01-23 | 31.373% | — | Imported | 2026-05-28 |
| 53 | GPT 5 Nano 2025-08-07 | 30.441% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
| 54 | Grok 4 Fast Non Reasoning | 30.036% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 55 | Grok 4.1 Fast Non Reasoning | 28.349% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 56 | Grok 4.1 Fast Reasoning | 28.08% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 57 | Gemini 2.5 Flash Lite | 27.115% | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-28 |
| 58 | Gemini 2.5 Flash Lite Preview 09 2025 | 27.079% | Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025 | Imported | 2026-05-28 |
| 59 | Llama 4 Scout 17B 16E Instruct | 23.311% | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-28 |
| 60 | Command A Plus 05 2026 | 19.405% | — | Imported | 2026-05-28 |
No matching rows.