MedScribe
Can models support doctors with their administrative work?
60rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT 5.1 2025-11-13 | 88.09% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-28 |
| 2 | GPT 5.5 | 86.868% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 3 | Claude Opus 4.6 | 86.738% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 4 | Claude Opus 4.6 Thinking | 86.13% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 5 | Muse Spark | 85.902% | — | Imported | 2026-05-28 |
| 6 | Claude Opus 4.8 | 85.755% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Imported | 2026-05-28 |
| 7 | Claude Opus 4.5 20251101 Thinking | 85.321% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 8 | Claude Haiku 4.5 20251001 Thinking | 85.23% | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 9 | Claude Sonnet 4.5 20250929 | 84.515% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 10 | GPT 5.2 2025-12-11 | 84.387% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 11 | Claude Sonnet 4.5 20250929 Thinking | 84.101% | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 12 | GPT 5.2025-08-07 | 83.65% | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 13 | Claude Opus 4.5 20251101 | 83.246% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-28 |
| 14 | Gemini 2.5 Flash Thinking | 82.983% | — | Imported | 2026-05-28 |
| 15 | Claude Opus 4.7 | 82.953% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 16 | Gemini 2.5 Flash | 82.869% | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 17 | Grok 4 Fast Reasoning | 81.632% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 18 | MiniMax M2.1 | 80.777% | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-28 |
| 19 | GPT 5 Mini 2025-08-07 | 80.577% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 20 | MiniMax M2.7 | 79.867% | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-28 |
| 21 | Grok 4 Fast Non Reasoning | 79.722% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 22 | Qwen 3.7 Max | 79.396% | Qwen3.7 Max qwen-qwen3.7-max | Imported | 2026-05-28 |
| 23 | Grok 4.1 Fast Reasoning | 78.732% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 24 | Gemini 2.5 Flash Preview 09 2025 Thinking | 78.497% | — | Imported | 2026-05-28 |
| 25 | Grok 4.0709 | 78.152% | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 26 | Kimi K2.6 Thinking | 78.149% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 27 | Gemini 2.5 Flash Preview 09 2025 | 77.946% | — | Imported | 2026-05-28 |
| 28 | GPT 5.4 2026-03-05 | 77.549% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 29 | Grok 4.1 Fast Non Reasoning | 77.464% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 30 | Qwen 3 Vl Plus 2025-09-23 | 77.129% | — | Imported | 2026-05-28 |
| 31 | GPT 5.4 Nano 2026-03-17 | 77.09% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
| 32 | Qwen 3.6 Plus | 76.963% | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 33 | O3 2025-04-16 | 76.654% | o3 openai-o3 | Imported | 2026-05-28 |
| 34 | Gemini 3.5 Flash | 76.574% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-28 |
| 35 | Kimi K2.5 Thinking | 76.442% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 36 | Gemini 3.1 Pro Preview | 76.114% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 37 | Gemini 2.5 Flash Lite Preview 09 2025 | 75.824% | Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025 | Imported | 2026-05-28 |
| 38 | DeepSeek V4 Pro | 75.144% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-28 |
| 39 | Grok 4.3 | 74.399% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-28 |
| 40 | Claude Opus 4.1 20250805 Thinking | 73.901% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 41 | Gemini 2.5 Pro | 73.552% | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 42 | GPT 5 Nano 2025-08-07 | 72.865% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
| 43 | Gemini 2.5 Flash Lite | 72.832% | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-28 |
| 44 | Qwen 3 Max 2026-01-23 | 72.709% | — | Imported | 2026-05-28 |
| 45 | Claude Sonnet 4.20250514 | 72.411% | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-28 |
| 46 | GLM 5.1 Thinking | 72.27% | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-28 |
| 47 | Gemini 3 Pro Preview | 72.036% | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 48 | Claude Opus 4.1 20250805 | 71.753% | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 49 | Qwen 3.5 Flash | 70.619% | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-28 |
| 50 | Gemini 3 Flash Preview | 69.917% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 51 | Claude Sonnet 4.20250514 Thinking | 69.353% | — | Imported | 2026-05-28 |
| 52 | O4 Mini 2025-04-16 | 69.139% | o4 Mini openai-o4-mini | Imported | 2026-05-28 |
| 53 | GLM 4.7 | 68.629% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-28 |
| 54 | Mistral Medium 3.5 | 67.728% | Mistral: Mistral Medium 3.5 mistralai-mistral-medium-3-5 | Imported | 2026-05-28 |
| 55 | Gemini 2.5 Flash Lite Preview 09 2025 Thinking | 66.877% | — | Imported | 2026-05-28 |
| 56 | Gemini 3.1 Flash Lite Preview | 63.902% | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-28 |
| 57 | Grok 4.20 0309 Reasoning | 63.412% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-28 |
| 58 | Command A Plus 05 2026 | 55.682% | — | Imported | 2026-05-28 |
| 59 | Llama4 Maverick Instruct Basic | 54.219% | — | Imported | 2026-05-28 |
| 60 | Llama 4 Scout 17B 16E Instruct | 50.593% | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-28 |
No matching rows.