RepairBench

RepairBench: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.

35rows
total_plausible_at_1primary metric
2026-05-27sampled

Metadata

Metrics

Total Plausible @1, Total AST Match @1, Total Cost (lower is better), Total Prompt Tokens (lower is better), Total Completion Tokens (lower is better), Total Tokens (lower is better), Defects4J Plausible @1, GitBug-Java Plausible @1

Latest Results

Rows are imported from the public RepairBench static JS payload. The leaderboard states it was retired after the June 11, 2025 update and covers 574 total bugs.

Rank Subject Total Plausible @1 Model Match Provenance Sampled
1 o4-mini-2025-04-16-high 0.503 Imported 2026-05-27
2 o3-mini-2025-01-31-high 0.464 o3 Mini High
openai-o3-mini-high
Imported 2026-05-27
3 deepseek-r1 0.452 R1
deepseek-r1
Imported 2026-05-27
4 claude-3-7-sonnet-20250219 0.44 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-27
5 claude-3-5-sonnet-20241022 0.418 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
6 gpt-4.1-2025-04-14 0.413 GPT-4.1
openai-gpt-4.1
Imported 2026-05-27
7 gemini-2.5-flash-preview-05-20 0.406 Imported 2026-05-27
8 deepseek-v3-0324 0.396 DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-27
9 claude-3-5-sonnet-20240620 0.391 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
10 gemini-2.5-pro-preview-03-25 0.383 Gemini 2.5 Pro Preview 05-06
google-gemini-2.5-pro-preview-05-06
Imported 2026-05-27
11 deepseek-v3 0.371 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-27
12 gemini-1.5-pro-002 0.332 Imported 2026-05-27
13 gpt-4o-2024-11-20 0.326 GPT-4o
openai-gpt-4o
Imported 2026-05-27
14 mistral-medium-2505 0.321 Imported 2026-05-27
15 gpt-4o-2024-08-06 0.317 GPT-4o
openai-gpt-4o
Imported 2026-05-27
16 llama-4-maverick 0.308 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-27
17 gemini-2.0-flash-001 0.304 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-27
18 magistral-medium-2506 0.295 Imported 2026-05-27
19 grok-2-1212 0.29 Imported 2026-05-27
20 gemini-1.5-pro-001 0.282 Imported 2026-05-27
21 llama-3.1-405b-instruct 0.27 Imported 2026-05-27
22 deepseek-v2.5 0.251 Imported 2026-05-27
23 qwen-2.5-coder-32b-instruct 0.25 Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-27
24 qwen-2.5-72b-instruct 0.242 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-27
25 command-a 0.24 C Command A
cohere-command-a
Imported 2026-05-27
26 mistral-large-2411 0.237 Mistral Large 2411
mistralai-mistral-large-2411
Imported 2026-05-27
27 mistral-large-2407 0.23 Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-27
28 llama-3.3-70b-instruct 0.224 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-27
29 llama-3.1-nemotron-70b-instruct 0.214 Llama 3.1 Nemotron 70B Instruct
nvidia-llama-3.1-nemotron-70b-instruct
Imported 2026-05-27
30 deepseek-r1-distill-llama-70b 0.208 R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-27
31 mistral-small-2503 0.204 Imported 2026-05-27
32 deepseek-r1-distill-qwen-32b 0.198 R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-27
33 codestral-2501 0.187 Imported 2026-05-27
34 gemma-3-27b-it 0.173 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-27
35 codestral-2405 0.164 Imported 2026-05-27