Clembench Multimodal v1.6.5
Clembench Multimodal evaluates chat-optimized multimodal models as conversational agents through visual language games, tracking Clemscore, played percentage, quality score, and task-level metrics.
Metadata
Metrics
clemscore, all Average % Played, all Average Quality Score, matchit % Played, matchit Quality Score, matchit Quality Score (std), mm_mapworld % Played, mm_mapworld Quality Score, mm_mapworld Quality Score (std), mm_mapworld_graphs % Played, mm_mapworld_graphs Quality Score, mm_mapworld_graphs Quality Score (std), mm_mapworld_specificroom % Played, mm_mapworld_specificroom Quality Score, mm_mapworld_specificroom Quality Score (std), multimodal_referencegame % Played, multimodal_referencegame Quality Score, multimodal_referencegame Quality Score (std)
| Rank | Subject | clemscore | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude-3-5-sonnet-20240620 | 80.77 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 2 | gpt-4o-2024-08-06 | 80.04 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-06 |
| 3 | gpt-4-1106-vision-preview | 73.55 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 4 | gpt-4o-2024-05-13 | 69.56 | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Imported | 2026-05-06 |
| 5 | claude-3-opus-20240229 | 68.16 | — | Imported | 2026-05-06 |
| 6 | gemma-3-27b-it | 61.39 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-06 |
| 7 | gpt-4o-mini-2024-07-18 | 58.46 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-06 |
| 8 | gemini-1.5-flash-latest | 47.73 | — | Imported | 2026-05-06 |
| 9 | InternVL2-26B | 37.45 | — | Imported | 2026-05-06 |
| 10 | InternVL2-Llama3-76B | 33.84 | — | Imported | 2026-05-06 |
| 11 | InternVL2-40B | 32.23 | — | Imported | 2026-05-06 |
| 12 | idefics-80b-instruct | 29.55 | — | Imported | 2026-05-06 |
| 13 | Pixtral-12B-2409 | 28.64 | — | Imported | 2026-05-06 |
| 14 | InternVL2-8B | 23.17 | — | Imported | 2026-05-06 |
| 15 | Idefics3-8B-Llama3 | 17.52 | — | Imported | 2026-05-06 |
| 16 | internlm-xcomposer2d5-7b | 16.95 | — | Imported | 2026-05-06 |
| 17 | Phi-3.5-vision-instruct | 15.64 | — | Imported | 2026-05-06 |
| 18 | idefics-9b-instruct | 12.29 | — | Imported | 2026-05-06 |
| 19 | dolphin-vision-72b | 4.65 | — | Imported | 2026-05-06 |
| 20 | Phi-3-vision-128k-instruct | 3.34 | — | Imported | 2026-05-06 |
No matching rows.