GitTaskBench
Repository-level code-agent benchmark covering real GitHub tasks, reporting task pass rate, execution completion rate, token usage, and cost.
28rows
task_pass_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Task Pass Rate, Execution Completion Rate, Input Tokens (lower is better), Output Tokens (lower is better), Cost (lower is better)
| Rank | Subject | Task Pass Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | RepoMaster + Claude 3.5 | 62.96 | — | Imported | 2026-05-27 |
| 2 | OpenHands + Claude 3.7 | 48.15 | — | Imported | 2026-05-27 |
| 3 | RepoMaster + DeepSeekV3 | 44.44 | — | Imported | 2026-05-27 |
| 4 | SWE-Agent + Claude 3.7 | 42.59 | — | Imported | 2026-05-27 |
| 5 | OpenHands + GPT-4.1 | 42.59 | — | Imported | 2026-05-27 |
| 6 | OpenHands + Claude 3.5 | 40.74 | — | Imported | 2026-05-27 |
| 7 | RepoMaster + GPT-4o | 40.74 | — | Imported | 2026-05-27 |
| 8 | OpenHands + Gemini-2.5-pro | 35.19 | — | Imported | 2026-05-27 |
| 9 | SWE-Agent + GPT-4.1 | 31.48 | — | Imported | 2026-05-27 |
| 10 | OpenHands + Qwen3-32b* | 29.63 | — | Imported | 2026-05-27 |
| 11 | OpenHands + DeepSeekV3 | 26.85 | — | Imported | 2026-05-27 |
| 12 | OpenHands + Qwen3-32b* | 25.93 | — | Imported | 2026-05-27 |
| 13 | SWE-Agent + Claude 3.5 | 22.23 | — | Imported | 2026-05-27 |
| 14 | OpenHands + o3-mini | 22.22 | — | Imported | 2026-05-27 |
| 15 | SWE-Agent + o3-mini | 20.37 | — | Imported | 2026-05-27 |
| 16 | OpenHands + Llama3.3-70b* | 20.37 | — | Imported | 2026-05-27 |
| 17 | SWE-Agent + Llama3.3-70b* | 18.52 | — | Imported | 2026-05-27 |
| 18 | Aider + DeepSeekV3 | 16.67 | — | Imported | 2026-05-27 |
| 19 | OpenHands + GPT-4o | 14.82 | — | Imported | 2026-05-27 |
| 20 | Aider + Claude 3.5 | 12.96 | — | Imported | 2026-05-27 |
| 21 | SWE-Agent + DeepSeekV3 | 12.04 | — | Imported | 2026-05-27 |
| 22 | SWE-Agent + Qwen3-32b* | 11.11 | — | Imported | 2026-05-27 |
| 23 | SWE-Agent + GPT-4o | 10.19 | — | Imported | 2026-05-27 |
| 24 | Aider + GPT-4.1 | 7.41 | — | Imported | 2026-05-27 |
| 25 | OpenHands + Qwen3-14b* | 5.56 | — | Imported | 2026-05-27 |
| 26 | SWE-Agent + Qwen3-32b* | 3.7 | — | Imported | 2026-05-27 |
| 27 | Aider + GPT-4o | 1.85 | — | Imported | 2026-05-27 |
| 28 | OpenHands + Qwen3-8b* | 1.85 | — | Imported | 2026-05-27 |
No matching rows.