SWE-bench Multimodal
SWE-bench extension with 517 software issues that include visual elements such as screenshots, mockups, diagrams, and visual error context.
24rows
resolvedprimary metric
2026-05-28sampled
Metadata
Metrics
Resolved
Showing 2 latest source slices.
| Rank | Subject | Resolved | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 38.4% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.7 | 34.5% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 1 | GUIRepair + o3 (2025-04-16) | 35.98% | — | Imported | 2025-11-17 |
| 2 | Codefuse_Pycfuse_SVR | 35.98% | — | Imported | 2025-11-17 |
| 3 | Refact.ai Agent | 35.59% | — | Imported | 2025-11-17 |
| 4 | OpenHands-Versa (Claude-Sonnet 4) | 34.43% | — | Imported | 2025-11-17 |
| 5 | GUIRepair + o4-mini (2025-04-16) | 33.85% | — | Imported | 2025-11-17 |
| 6 | OpenHands-Versa (Claude-3.7 Sonnet) | 31.33% | — | Imported | 2025-11-17 |
| 7 | GUIRepair + GPT 4.1 (2025-04-14) | 31.14% | — | Imported | 2025-11-17 |
| 8 | Zencoder (2025-04-01) | 30.56% | — | Imported | 2025-11-17 |
| 9 | GUIRepair + GPT 4o (2024-08-06) | 30.37% | — | Imported | 2025-11-17 |
| 10 | Globant Code Fixer Agent | 29.59% | — | Imported | 2025-11-17 |
| 11 | Zencoder (2025-03-10) | 27.08% | — | Imported | 2025-11-17 |
| 12 | Agentless Lite + Claude-3.5 Sonnet | 25.34% | — | Imported | 2025-11-17 |
| 13 | SWE-agent Multimodal + GPT 4o (2024-08-06) | 12.19% | — | Imported | 2025-11-17 |
| 14 | SWE-agent + Claude Sonnet 3.5 | 12.19% | — | Imported | 2025-11-17 |
| 15 | SWE-agent JavaScript + Claude Sonnet 3.5 | 11.99% | — | Imported | 2025-11-17 |
| 16 | SWE-agent + GPT 4o (2024-08-06) | 11.99% | — | Imported | 2025-11-17 |
| 17 | SWE-agent Multimodal + Claude 3.5 Sonnet | 11.41% | — | Imported | 2025-11-17 |
| 18 | SWE-agent JavaScript + GPT 4o (2024-08-06) | 9.28% | — | Imported | 2025-11-17 |
| 19 | Agentless + Claude 3.5 Sonnet | 6.19% | — | Imported | 2025-11-17 |
| 20 | RAG + GPT 4o (2024-08-06) | 6% | — | Imported | 2025-11-17 |
| 21 | RAG + Claude 3.5 Sonnet | 5.03% | — | Imported | 2025-11-17 |
| 22 | Agentless + GPT 4o (2024-08-06) | 3.09% | — | Imported | 2025-11-17 |
No matching rows.