SWE-bench Verified Mini (HAL)

HAL's cost-aware agent leaderboard for the SWE-bench Verified Mini software engineering subset.

33rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Cost (USD) (lower is better), Runs

Latest Results

Rows are parsed from the public HAL static leaderboard table. Source scaffold/model display names are preserved; score is the table's Accuracy percentage.

Rank Subject Accuracy Model Match Provenance Sampled
1 SWE-Agent / Claude Sonnet 4.5 High (September 2025) 72 Verified 2026-05-27
2 SWE-Agent / Claude Sonnet 4.5 (September 2025) 68 Verified 2026-05-27
3 SWE-Agent / Claude Opus 4.1 (August 2025) 61 Verified 2026-05-27
4 SWE-Agent / o4-mini Low (April 2025) 54 Verified 2026-05-27
5 SWE-Agent / Claude-3.7 Sonnet High (February 2025) 54 Verified 2026-05-27
6 SWE-Agent / Claude Opus 4.1 High (August 2025) 54 Verified 2026-05-27
7 SWE-Agent / o4-mini High (April 2025) 50 Verified 2026-05-27
8 SWE-Agent / Claude-3.7 Sonnet (February 2025) 50 Verified 2026-05-27
9 SWE-Agent / Claude Opus 4 (May 2025) 50 Verified 2026-05-27
10 SWE-Agent / GPT-5 Medium (August 2025) 46 Verified 2026-05-27
11 HAL Generalist Agent / Claude Opus 4.1 High (August 2025) 46 Verified 2026-05-27
12 SWE-Agent / o3 Medium (April 2025) 46 Verified 2026-05-27
13 HAL Generalist Agent / Claude Haiku 4.5 High (October 2025) 44 Verified 2026-05-27
14 SWE-Agent / GPT-4.1 (April 2025) 44 Verified 2026-05-27
15 HAL Generalist Agent / Claude Opus 4.1 (August 2025) 42 Verified 2026-05-27
16 HAL Generalist Agent / Claude Sonnet 4.5 High (September 2025) 40 Verified 2026-05-27
17 HAL Generalist Agent / Claude Sonnet 4.5 (September 2025) 34 Verified 2026-05-27
18 HAL Generalist Agent / Claude Opus 4 (May 2025) 34 Verified 2026-05-27
19 HAL Generalist Agent / Claude Opus 4 High (May 2025) 30 Verified 2026-05-27
20 HAL Generalist Agent / Claude-3.7 Sonnet (February 2025) 26 Verified 2026-05-27
21 SWE-Agent / Gemini 2.0 Flash (February 2025) 24 Verified 2026-05-27
22 SWE-Agent / DeepSeek V3 (March 2025) 24 Verified 2026-05-27
23 HAL Generalist Agent / Claude-3.7 Sonnet High (February 2025) 24 Verified 2026-05-27
24 HAL Generalist Agent / Claude Haiku 4.5 (October 2025) 24 Verified 2026-05-27
25 HAL Generalist Agent / GPT-5 Medium (August 2025) 12 Verified 2026-05-27
26 HAL Generalist Agent / DeepSeek V3 (March 2025) 10 Verified 2026-05-27
27 HAL Generalist Agent / o4-mini Low (April 2025) 6 Verified 2026-05-27
28 HAL Generalist Agent / DeepSeek R1 (January 2025) 6 Verified 2026-05-27
29 HAL Generalist Agent / Gemini 2.0 Flash (February 2025) 2 Verified 2026-05-27
30 HAL Generalist Agent / o4-mini High (April 2025) 2 Verified 2026-05-27
31 HAL Generalist Agent / GPT-4.1 (April 2025) 2 Verified 2026-05-27
32 SWE-Agent / DeepSeek R1 (January 2025) 0 Verified 2026-05-27
33 HAL Generalist Agent / o3 Medium (April 2025) 0 Verified 2026-05-27