RepairBench
RepairBench: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.
35rows
total_plausible_at_1primary metric
2026-05-27sampled
Metadata
Metrics
Total Plausible @1, Total AST Match @1, Total Cost (lower is better), Total Prompt Tokens (lower is better), Total Completion Tokens (lower is better), Total Tokens (lower is better), Defects4J Plausible @1, GitBug-Java Plausible @1
| Rank | Subject | Total Plausible @1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | o4-mini-2025-04-16-high | 0.503 | — | Imported | 2026-05-27 |
| 2 | o3-mini-2025-01-31-high | 0.464 | o3 Mini High openai-o3-mini-high | Imported | 2026-05-27 |
| 3 | deepseek-r1 | 0.452 | R1 deepseek-r1 | Imported | 2026-05-27 |
| 4 | claude-3-7-sonnet-20250219 | 0.44 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 5 | claude-3-5-sonnet-20241022 | 0.418 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 6 | gpt-4.1-2025-04-14 | 0.413 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 7 | gemini-2.5-flash-preview-05-20 | 0.406 | — | Imported | 2026-05-27 |
| 8 | deepseek-v3-0324 | 0.396 | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-27 |
| 9 | claude-3-5-sonnet-20240620 | 0.391 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 10 | gemini-2.5-pro-preview-03-25 | 0.383 | Gemini 2.5 Pro Preview 05-06 google-gemini-2.5-pro-preview-05-06 | Imported | 2026-05-27 |
| 11 | deepseek-v3 | 0.371 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-27 |
| 12 | gemini-1.5-pro-002 | 0.332 | — | Imported | 2026-05-27 |
| 13 | gpt-4o-2024-11-20 | 0.326 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 14 | mistral-medium-2505 | 0.321 | — | Imported | 2026-05-27 |
| 15 | gpt-4o-2024-08-06 | 0.317 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 16 | llama-4-maverick | 0.308 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-27 |
| 17 | gemini-2.0-flash-001 | 0.304 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-27 |
| 18 | magistral-medium-2506 | 0.295 | — | Imported | 2026-05-27 |
| 19 | grok-2-1212 | 0.29 | — | Imported | 2026-05-27 |
| 20 | gemini-1.5-pro-001 | 0.282 | — | Imported | 2026-05-27 |
| 21 | llama-3.1-405b-instruct | 0.27 | — | Imported | 2026-05-27 |
| 22 | deepseek-v2.5 | 0.251 | — | Imported | 2026-05-27 |
| 23 | qwen-2.5-coder-32b-instruct | 0.25 | Qwen2.5 Coder 32B Instruct qwen-qwen-2.5-coder-32b-instruct | Imported | 2026-05-27 |
| 24 | qwen-2.5-72b-instruct | 0.242 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-27 |
| 25 | command-a | 0.24 | Command A cohere-command-a | Imported | 2026-05-27 |
| 26 | mistral-large-2411 | 0.237 | Mistral Large 2411 mistralai-mistral-large-2411 | Imported | 2026-05-27 |
| 27 | mistral-large-2407 | 0.23 | Mistral Large 2407 mistralai-mistral-large-2407 | Imported | 2026-05-27 |
| 28 | llama-3.3-70b-instruct | 0.224 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-27 |
| 29 | llama-3.1-nemotron-70b-instruct | 0.214 | Llama 3.1 Nemotron 70B Instruct nvidia-llama-3.1-nemotron-70b-instruct | Imported | 2026-05-27 |
| 30 | deepseek-r1-distill-llama-70b | 0.208 | R1 Distill Llama 70B deepseek-deepseek-r1-distill-llama-70b | Imported | 2026-05-27 |
| 31 | mistral-small-2503 | 0.204 | — | Imported | 2026-05-27 |
| 32 | deepseek-r1-distill-qwen-32b | 0.198 | R1 Distill Qwen 32B deepseek-deepseek-r1-distill-qwen-32b | Imported | 2026-05-27 |
| 33 | codestral-2501 | 0.187 | — | Imported | 2026-05-27 |
| 34 | gemma-3-27b-it | 0.173 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-27 |
| 35 | codestral-2405 | 0.164 | — | Imported | 2026-05-27 |
No matching rows.