CORE-Bench Hard

HAL's cost-aware agent leaderboard for CORE-Bench Hard scientific programming tasks.

49rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Cost (USD) (lower is better), Runs

Latest Results

Rows are parsed from the public HAL static leaderboard table. Source scaffold/model display names are preserved; score is the table's Accuracy percentage.

Rank Subject Accuracy Model Match Provenance Sampled
1 Claude Code / Claude Opus 4.5 77.78 Verified 2026-05-27
2 Claude Code / Claude Sonnet 4.5 (September 2025) 62.22 Verified 2026-05-27
3 CORE-Agent / Claude Opus 4.1 (August 2025) 51.11 Verified 2026-05-27
4 Claude Code / Claude Sonnet 4 (May 2025) 46.67 Verified 2026-05-27
5 CORE-Agent / Claude Sonnet 4.5 High (September 2025) 44.44 Verified 2026-05-27
6 CORE-Agent / Claude Opus 4.5 High (November 2025) 42.22 Verified 2026-05-27
7 CORE-Agent / Claude Opus 4.5 (November 2025) 42.22 Verified 2026-05-27
8 Claude Code / Claude Opus 4.1 42.22 Verified 2026-05-27
9 CORE-Agent / Claude Opus 4.1 High (August 2025) 42.22 Verified 2026-05-27
10 CORE-Agent / Gemini 3 Pro Preview High (November 2025) 40 Verified 2026-05-27
11 HAL Generalist Agent / Claude-3.7 Sonnet High (February 2025) 37.78 Verified 2026-05-27
12 CORE-Agent / Claude Sonnet 4.5 (September 2025) 37.78 Verified 2026-05-27
13 HAL Generalist Agent / o4-mini High (April 2025) 35.56 Verified 2026-05-27
14 CORE-Agent / Claude-3.7 Sonnet (February 2025) 35.56 Verified 2026-05-27
15 HAL Generalist Agent / Gemini 3 Pro Preview High (November 2025) 35.56 Verified 2026-05-27
16 HAL Generalist Agent / Claude Opus 4.1 (August 2025) 35.56 Verified 2026-05-27
17 HAL Generalist Agent / Claude Sonnet 4.5 (September 2025) 33.33 Verified 2026-05-27
18 CORE-Agent / Claude Sonnet 4 High (May 2025) 33.33 Verified 2026-05-27
19 CORE-Agent / GPT-4.1 (April 2025) 33.33 Verified 2026-05-27
20 HAL Generalist Agent / Claude Opus 4.5 (November 2025) 33.33 Verified 2026-05-27
21 HAL Generalist Agent / Claude Opus 4.1 High (August 2025) 33.33 Verified 2026-05-27
22 HAL Generalist Agent / Claude-3.7 Sonnet (February 2025) 31.11 Verified 2026-05-27
23 HAL Generalist Agent / Claude Opus 4.5 High (November 2025) 31.11 Verified 2026-05-27
24 CORE-Agent / Claude Sonnet 4 (May 2025) 28.89 Verified 2026-05-27
25 HAL Generalist Agent / Claude Sonnet 4.5 High (September 2025) 28.89 Verified 2026-05-27
26 CORE-Agent / GPT-5 Medium (August 2025) 26.67 Verified 2026-05-27
27 CORE-Agent / o4-mini High (April 2025) 26.67 Verified 2026-05-27
28 CORE-Agent / Claude-3.7 Sonnet High (February 2025) 24.44 Verified 2026-05-27
29 CORE-Agent / o3 Medium (April 2025) 24.44 Verified 2026-05-27
30 HAL Generalist Agent / GPT-4.1 (April 2025) 22.22 Verified 2026-05-27
31 HAL Generalist Agent / o3 Medium (April 2025) 22.22 Verified 2026-05-27
32 CORE-Agent / Gemini 2.5 Pro Preview (March 2025) 22.22 Verified 2026-05-27
33 CORE-Agent / DeepSeek V3.1 (August 2025) 20 Verified 2026-05-27
34 CORE-Agent / DeepSeek V3 (March 2025) 17.78 Verified 2026-05-27
35 CORE-Agent / o4-mini Low (April 2025) 17.78 Verified 2026-05-27
36 HAL Generalist Agent / o4-mini Low (April 2025) 15.56 Verified 2026-05-27
37 CORE-Agent / GPT-OSS-120B (August 2025) 11.11 Verified 2026-05-27
38 CORE-Agent / GPT-OSS-120B High (August 2025) 11.11 Verified 2026-05-27
39 CORE-Agent / Gemini 2.0 Flash (February 2025) 11.11 Verified 2026-05-27
40 HAL Generalist Agent / GPT-5 Medium (August 2025) 11.11 Verified 2026-05-27
41 CORE-Agent / Claude Haiku 4.5 (October 2025) 11.11 Verified 2026-05-27
42 HAL Generalist Agent / GPT-OSS-120B High (August 2025) 8.89 Verified 2026-05-27
43 HAL Generalist Agent / GPT-OSS-120B (August 2025) 8.89 Verified 2026-05-27
44 HAL Generalist Agent / DeepSeek V3 (March 2025) 8.89 Verified 2026-05-27
45 HAL Generalist Agent / DeepSeek R1 (May 2025) 8.89 Verified 2026-05-27
46 CORE-Agent / DeepSeek R1 (January 2025) 6.67 Verified 2026-05-27
47 HAL Generalist Agent / DeepSeek R1 (January 2025) 4.45 Verified 2026-05-27
48 HAL Generalist Agent / Gemini 2.0 Flash (February 2025) 4.44 Verified 2026-05-27
49 HAL Generalist Agent / Gemini 2.5 Pro Preview (March 2025) 4.44 Verified 2026-05-27