Defects4J
Defects4J: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.
35rows
defects4j_plausible_at_1primary metric
2026-05-27sampled
Metadata
Metrics
Defects4J Plausible @1, Defects4J AST Match @1, Defects4J Exact Match @1, Defects4J Cost (lower is better), Defects4J Prompt Tokens (lower is better), Defects4J Completion Tokens (lower is better), Defects4J Total Tokens (lower is better)
| Rank | Subject | Defects4J Plausible @1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | o4-mini-2025-04-16-high | 0.538 | — | Imported | 2026-05-27 |
| 2 | o3-mini-2025-01-31-high | 0.488 | o3 Mini High openai-o3-mini-high | Imported | 2026-05-27 |
| 3 | claude-3-7-sonnet-20250219 | 0.478 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-27 |
| 4 | deepseek-r1 | 0.475 | R1 deepseek-r1 | Imported | 2026-05-27 |
| 5 | gpt-4.1-2025-04-14 | 0.452 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-27 |
| 6 | claude-3-5-sonnet-20241022 | 0.441 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 7 | gemini-2.5-flash-preview-05-20 | 0.434 | — | Imported | 2026-05-27 |
| 8 | deepseek-v3-0324 | 0.43 | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-27 |
| 9 | claude-3-5-sonnet-20240620 | 0.415 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 10 | gemini-2.5-pro-preview-03-25 | 0.414 | Gemini 2.5 Pro Preview 05-06 google-gemini-2.5-pro-preview-05-06 | Imported | 2026-05-27 |
| 11 | deepseek-v3 | 0.399 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-27 |
| 12 | gemini-1.5-pro-002 | 0.364 | — | Imported | 2026-05-27 |
| 13 | gpt-4o-2024-11-20 | 0.35 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 14 | mistral-medium-2505 | 0.349 | — | Imported | 2026-05-27 |
| 15 | gpt-4o-2024-08-06 | 0.341 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 16 | llama-4-maverick | 0.337 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-27 |
| 17 | gemini-2.0-flash-001 | 0.33 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-27 |
| 18 | magistral-medium-2506 | 0.321 | — | Imported | 2026-05-27 |
| 19 | grok-2-1212 | 0.31 | — | Imported | 2026-05-27 |
| 20 | gemini-1.5-pro-001 | 0.303 | — | Imported | 2026-05-27 |
| 21 | llama-3.1-405b-instruct | 0.289 | — | Imported | 2026-05-27 |
| 22 | qwen-2.5-coder-32b-instruct | 0.271 | Qwen2.5 Coder 32B Instruct qwen-qwen-2.5-coder-32b-instruct | Imported | 2026-05-27 |
| 23 | deepseek-v2.5 | 0.266 | — | Imported | 2026-05-27 |
| 24 | command-a | 0.261 | Command A cohere-command-a | Imported | 2026-05-27 |
| 25 | qwen-2.5-72b-instruct | 0.255 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-27 |
| 26 | mistral-large-2411 | 0.252 | Mistral Large 2411 mistralai-mistral-large-2411 | Imported | 2026-05-27 |
| 27 | mistral-large-2407 | 0.245 | Mistral Large 2407 mistralai-mistral-large-2407 | Imported | 2026-05-27 |
| 28 | llama-3.3-70b-instruct | 0.234 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-27 |
| 29 | llama-3.1-nemotron-70b-instruct | 0.228 | Llama 3.1 Nemotron 70B Instruct nvidia-llama-3.1-nemotron-70b-instruct | Imported | 2026-05-27 |
| 30 | mistral-small-2503 | 0.225 | — | Imported | 2026-05-27 |
| 31 | deepseek-r1-distill-llama-70b | 0.221 | R1 Distill Llama 70B deepseek-deepseek-r1-distill-llama-70b | Imported | 2026-05-27 |
| 32 | deepseek-r1-distill-qwen-32b | 0.214 | R1 Distill Qwen 32B deepseek-deepseek-r1-distill-qwen-32b | Imported | 2026-05-27 |
| 33 | codestral-2501 | 0.198 | — | Imported | 2026-05-27 |
| 34 | gemma-3-27b-it | 0.184 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-27 |
| 35 | codestral-2405 | 0.177 | — | Imported | 2026-05-27 |
No matching rows.