Defects4J

Defects4J: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.

35rows
defects4j_plausible_at_1primary metric
2026-05-27sampled

Metadata

Metrics

Defects4J Plausible @1, Defects4J AST Match @1, Defects4J Exact Match @1, Defects4J Cost (lower is better), Defects4J Prompt Tokens (lower is better), Defects4J Completion Tokens (lower is better), Defects4J Total Tokens (lower is better)

Latest Results

Rows are imported from the Defects4J v2 subset columns in the public RepairBench leaderboard static JS payload, covering 484 bugs.

Rank Subject Defects4J Plausible @1 Model Match Provenance Sampled
1 o4-mini-2025-04-16-high 0.538 Imported 2026-05-27
2 o3-mini-2025-01-31-high 0.488 o3 Mini High
openai-o3-mini-high
Imported 2026-05-27
3 claude-3-7-sonnet-20250219 0.478 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
4 deepseek-r1 0.475 R1
deepseek-r1
Imported 2026-05-27
5 gpt-4.1-2025-04-14 0.452 GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
6 claude-3-5-sonnet-20241022 0.441 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
7 gemini-2.5-flash-preview-05-20 0.434 Imported 2026-05-27
8 deepseek-v3-0324 0.43 DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-27
9 claude-3-5-sonnet-20240620 0.415 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
10 gemini-2.5-pro-preview-03-25 0.414 Gemini 2.5 Pro Preview 05-06
google-gemini-2.5-pro-preview-05-06
Imported 2026-05-27
11 deepseek-v3 0.399 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-27
12 gemini-1.5-pro-002 0.364 Imported 2026-05-27
13 gpt-4o-2024-11-20 0.35 GPT-4o
openai-gpt-4o
Imported 2026-05-27
14 mistral-medium-2505 0.349 Imported 2026-05-27
15 gpt-4o-2024-08-06 0.341 GPT-4o
openai-gpt-4o
Imported 2026-05-27
16 llama-4-maverick 0.337 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-27
17 gemini-2.0-flash-001 0.33 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-27
18 magistral-medium-2506 0.321 Imported 2026-05-27
19 grok-2-1212 0.31 Imported 2026-05-27
20 gemini-1.5-pro-001 0.303 Imported 2026-05-27
21 llama-3.1-405b-instruct 0.289 Imported 2026-05-27
22 qwen-2.5-coder-32b-instruct 0.271 Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-27
23 deepseek-v2.5 0.266 Imported 2026-05-27
24 command-a 0.261 C Command A
cohere-command-a
Imported 2026-05-27
25 qwen-2.5-72b-instruct 0.255 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-27
26 mistral-large-2411 0.252 Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-05-27
27 mistral-large-2407 0.245 Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-27
28 llama-3.3-70b-instruct 0.234 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-27
29 llama-3.1-nemotron-70b-instruct 0.228 Llama 3.1 Nemotron 70B Instruct
nvidia-llama-3.1-nemotron-70b-instruct
Imported 2026-05-27
30 mistral-small-2503 0.225 Imported 2026-05-27
31 deepseek-r1-distill-llama-70b 0.221 R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-27
32 deepseek-r1-distill-qwen-32b 0.214 R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-27
33 codestral-2501 0.198 Imported 2026-05-27
34 gemma-3-27b-it 0.184 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-27
35 codestral-2405 0.177 Imported 2026-05-27