Medmarks
Living medical-model benchmark spanning verifiable and open-ended clinical evaluation families with model/system comparisons.
83rows
win_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Win Rate, Benchmark Mean
| Rank | Subject | Win Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.2 (medium) (open-ended) | 0.6389159522138381 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-27 |
| 2 | GPT-5.1 (medium) (open-ended) | 0.6243980841829406 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-27 |
| 3 | Baichuan M3 235B (open-ended) | 0.5677789621546633 | — | Imported | 2026-05-27 |
| 4 | gpt-oss 120b (high) (open-ended) | 0.5507240209717496 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-27 |
| 5 | Qwen3 235B-A22B Thinking (open-ended) | 0.5160613633727644 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Imported | 2026-05-27 |
| 6 | Claude Sonnet 4.5 (open-ended) | 0.49977366098330706 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-27 |
| 7 | Baichuan M2 32B (open-ended) | 0.4761033570298975 | — | Imported | 2026-05-27 |
| 8 | Gemini 3 Pro Preview (open-ended) | 0.4712656820900838 | Gemini 3 google-gemini-3 | Imported | 2026-05-27 |
| 9 | GLM 4.7 FP8 (open-ended) | 0.4518797039744835 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-27 |
| 10 | gpt-oss 20b (high) (open-ended) | 0.4266482358575701 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-27 |
| 11 | Qwen3 30B-A3B Thinking (open-ended) | 0.41304451718604834 | — | Imported | 2026-05-27 |
| 12 | Qwen3 8B (Thinking) (open-ended) | 0.3634064599826541 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-27 |
| 1 | Gemini 3 Pro Preview (verifiable) | 0.6627770031943667 | Gemini 3 google-gemini-3 | Imported | 2026-05-27 |
| 2 | GPT-5.1 (medium) (verifiable) | 0.6395161191059014 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-27 |
| 3 | Grok 4 (verifiable) | 0.6342733786197539 | Grok 4 x-ai-grok-4 | Imported | 2026-05-27 |
| 4 | Claude Sonnet 4.5 (verifiable) | 0.6257561642171057 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-27 |
| 5 | GPT-5.2 (medium) (verifiable) | 0.6236362195525137 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-27 |
| 6 | GLM 4.7 FP8 (verifiable) | 0.6199015428298814 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-27 |
| 7 | Qwen3 235B-A22B Thinking (verifiable) | 0.6031541433378664 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Imported | 2026-05-27 |
| 8 | Baichuan M3 235B (verifiable) | 0.5982526399869947 | — | Imported | 2026-05-27 |
| 9 | Qwen3 Next 80B-A3B Thinking (verifiable) | 0.5887883670153007 | Qwen3 Next 80B A3B Thinking qwen-qwen3-next-80b-a3b-thinking | Imported | 2026-05-27 |
| 10 | MiniMax M2.1 (verifiable) | 0.5882080602904762 | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-27 |
| 11 | gpt-oss 120b (high) (verifiable) | 0.5864776402646992 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-27 |
| 12 | gpt-oss 120b (medium) (verifiable) | 0.5771191625196621 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-27 |
| 13 | MiniMax M2 (verifiable) | 0.5707789479326955 | MiniMax M2 minimax-minimax-m2 | Imported | 2026-05-27 |
| 14 | Qwen3 Next 80B-A3B Instruct (verifiable) | 0.5687149198848898 | Qwen3 Next 80B A3B Instruct qwen-qwen3-next-80b-a3b-instruct | Imported | 2026-05-27 |
| 15 | Intellect 3 (verifiable) | 0.565976071102146 | INTELLECT-3 prime-intellect-intellect-3 | Imported | 2026-05-27 |
| 16 | Qwen3 30B-A3B Thinking FP8 (verifiable) | 0.560069094508121 | — | Imported | 2026-05-27 |
| 17 | Qwen3 30B-A3B Thinking 8-bit (verifiable) | 0.5591250708061003 | — | Imported | 2026-05-27 |
| 18 | Qwen3 30B-A3B Thinking (verifiable) | 0.5587374669826407 | — | Imported | 2026-05-27 |
| 19 | gpt-oss 120b (low) (verifiable) | 0.552403723762488 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-27 |
| 20 | AntAngelMed 100B (verifiable) | 0.5523687077377001 | — | Imported | 2026-05-27 |
| 21 | Baichuan M2 32B (verifiable) | 0.5520088670998679 | — | Imported | 2026-05-27 |
| 22 | Qwen3 VL 30B-A3B Thinking (verifiable) | 0.5508662829903576 | Qwen3 VL 30B A3B Thinking qwen-qwen3-vl-30b-a3b-thinking | Imported | 2026-05-27 |
| 23 | Qwen3 30B-A3B Thinking 4-bit (verifiable) | 0.5481082240479177 | — | Imported | 2026-05-27 |
| 24 | GLM 4.5 Air (verifiable) | 0.5410241342400426 | GLM 4.5 Air z-ai-glm-4.5-air | Imported | 2026-05-27 |
| 25 | Qwen3 14B (Thinking) (verifiable) | 0.5366833465791869 | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-27 |
| 26 | Llama 3.3 70B Instruct (verifiable) | 0.5363568636656147 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-27 |
| 27 | gpt-oss 20b (high) (verifiable) | 0.5361352530208202 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-27 |
| 28 | Qwen3 30B-A3B Instruct 8-bit (verifiable) | 0.5320939419817264 | — | Imported | 2026-05-27 |
| 29 | Qwen3 30B-A3B Instruct (verifiable) | 0.5316817490568033 | — | Imported | 2026-05-27 |
| 30 | Qwen3 30B-A3B Instruct FP8 (verifiable) | 0.5310679149932498 | — | Imported | 2026-05-27 |
| 31 | Qwen3 30B-A3B Instruct 4-bit (verifiable) | 0.5253449626609891 | — | Imported | 2026-05-27 |
| 32 | gpt-oss 20b (medium) (verifiable) | 0.5197952213409743 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-27 |
| 33 | Ling Flash 2.0 (verifiable) | 0.517443364568796 | — | Imported | 2026-05-27 |
| 34 | Nemotron Nano V3 30B-A3B (verifiable) | 0.511383018613083 | — | Imported | 2026-05-27 |
| 35 | Olmo 3.1 32B Think (verifiable) | 0.5067271118560839 | — | Imported | 2026-05-27 |
| 36 | MedGemma 27B (verifiable) | 0.5024248457579267 | — | Imported | 2026-05-27 |
| 37 | Olmo 3 32B Think (verifiable) | 0.5021350069895713 | Olmo 3 32B Think allenai-olmo-3-32b-think | Imported | 2026-05-27 |
| 38 | Qwen3 8B (Thinking) (verifiable) | 0.5019479387680824 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-27 |
| 39 | Nemotron Nano 12B V2 (verifiable) | 0.500617429141079 | — | Imported | 2026-05-27 |
| 40 | Qwen2.5 32B Instruct (verifiable) | 0.5003027449594964 | — | Imported | 2026-05-27 |
| 41 | Qwen3 4B Thinking (verifiable) | 0.4890572132245168 | — | Imported | 2026-05-27 |
| 42 | Trinity Mini (verifiable) | 0.48530192037153297 | Trinity Mini arcee-ai-trinity-mini | Imported | 2026-05-27 |
| 43 | Hermes 4 70B (verifiable) | 0.4849971812302666 | Hermes 4 70B nousresearch-hermes-4-70b | Imported | 2026-05-27 |
| 44 | gpt-oss 20b (low) (verifiable) | 0.4820454322540748 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-27 |
| 45 | Phi 4 Reasoning (verifiable) | 0.4814858250317761 | — | Imported | 2026-05-27 |
| 46 | Ministral 3 14B Reasoning (verifiable) | 0.4807102626851982 | — | Imported | 2026-05-27 |
| 47 | Hermes 4 14B (verifiable) | 0.4792513741037972 | — | Imported | 2026-05-27 |
| 48 | Mirothinker 1.5 30B (verifiable) | 0.4767840336895432 | — | Imported | 2026-05-27 |
| 49 | Ministral 3 14B Instruct (verifiable) | 0.47596322969546634 | — | Imported | 2026-05-27 |
| 50 | Magistral Small (verifiable) | 0.4745038458490505 | — | Imported | 2026-05-27 |
| 51 | Gemma 3 27B (verifiable) | 0.4610358850116875 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-27 |
| 52 | Olmo 3.1 32B Instruct (verifiable) | 0.45600284965867655 | Olmo 3.1 32B Instruct allenai-olmo-3.1-32b-instruct | Imported | 2026-05-27 |
| 53 | DASD 4B Thinking (verifiable) | 0.45253644883118105 | — | Imported | 2026-05-27 |
| 54 | Jamba2 Mini 52B (verifiable) | 0.44576937105437486 | — | Imported | 2026-05-27 |
| 55 | Granite 4.0H Small (verifiable) | 0.44519142551470775 | — | Imported | 2026-05-27 |
| 56 | Ministral 3 8B Instruct (verifiable) | 0.4441678595421642 | — | Imported | 2026-05-27 |
| 57 | Gemma 3 12B (verifiable) | 0.4378666803080891 | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-27 |
| 58 | Ministral 3 8B Reasoning (verifiable) | 0.431176364193834 | — | Imported | 2026-05-27 |
| 59 | Olmo 3 7B Think (verifiable) | 0.41298909104304243 | — | Imported | 2026-05-27 |
| 60 | DASD 30B-A3B (verifiable) | 0.4078997105339978 | — | Imported | 2026-05-27 |
| 61 | Llama 3.1 8B Instruct (verifiable) | 0.39674525595082544 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-27 |
| 62 | Ministral 3 3B Instruct (verifiable) | 0.39273238228563234 | — | Imported | 2026-05-27 |
| 63 | MedGemma 4B (verifiable) | 0.37785575712118635 | — | Imported | 2026-05-27 |
| 64 | MedGemma 4B 1.5 (verifiable) | 0.3759376783134903 | — | Imported | 2026-05-27 |
| 65 | Ministral 3 3B Reasoning (verifiable) | 0.37411793955645295 | — | Imported | 2026-05-27 |
| 66 | Trinity Nano Preview (verifiable) | 0.35961788899182445 | — | Imported | 2026-05-27 |
| 67 | SmolLM3 3B (verifiable) | 0.3548178824161098 | — | Imported | 2026-05-27 |
| 68 | Granite 4.0H Tiny (verifiable) | 0.35220342825432055 | — | Imported | 2026-05-27 |
| 69 | Olmo 3 7B Instruct (verifiable) | 0.3503147766281301 | — | Imported | 2026-05-27 |
| 70 | Gemma 3 4B (verifiable) | 0.3214340555088692 | Gemma 3 4B google-gemma-3-4b-it | Imported | 2026-05-27 |
| 71 | AFM 4.5B (verifiable) | 0.31930428516320947 | — | Imported | 2026-05-27 |
No matching rows.