Scicode (HAL)

HAL's cost-aware agent leaderboard for Scicode scientific programming tasks.

33rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Cost (USD) (lower is better), Runs

Latest Results

Rows are parsed from the public HAL static leaderboard table. Source scaffold/model display names are preserved; score is the table's Accuracy percentage.

Rank Subject Accuracy Model Match Provenance Sampled
1 Scicode Zero Shot Agent / o4-mini Low (April 2025) 9.23 Verified 2026-05-27
2 Scicode Tool Calling Agent / o3 Medium (April 2025) 9.23 Verified 2026-05-27
3 Scicode Tool Calling Agent / Claude Opus 4.1 (August 2025) 7.69 Verified 2026-05-27
4 Scicode Tool Calling Agent / Claude Opus 4.1 High (August 2025) 6.92 Verified 2026-05-27
5 Scicode Zero Shot Agent / GPT-4.1 (April 2025) 6.15 Verified 2026-05-27
6 Scicode Zero Shot Agent / o4-mini High (April 2025) 6.15 Verified 2026-05-27
7 HAL Generalist Agent / o4-mini Low (April 2025) 6.15 Verified 2026-05-27
8 Scicode Tool Calling Agent / GPT-5 Medium (August 2025) 6.15 Verified 2026-05-27
9 Scicode Zero Shot Agent / o3 Medium (April 2025) 4.62 Verified 2026-05-27
10 Scicode Tool Calling Agent / o4-mini Low (April 2025) 4.62 Verified 2026-05-27
11 Scicode Tool Calling Agent / o4-mini High (April 2025) 4.62 Verified 2026-05-27
12 Scicode Tool Calling Agent / Claude-3.7 Sonnet High (February 2025) 4.62 Verified 2026-05-27
13 Scicode Zero Shot Agent / DeepSeek V3 (March 2025) 3.08 Verified 2026-05-27
14 Scicode Zero Shot Agent / Claude-3.7 Sonnet High (February 2025) 3.08 Verified 2026-05-27
15 HAL Generalist Agent / Claude-3.7 Sonnet (February 2025) 3.08 Verified 2026-05-27
16 HAL Generalist Agent / o3 Medium (April 2025) 3.08 Verified 2026-05-27
17 Scicode Tool Calling Agent / Claude Sonnet 4.5 (September 2025) 3.08 Verified 2026-05-27
18 HAL Generalist Agent / Claude-3.7 Sonnet High (February 2025) 3.08 Verified 2026-05-27
19 Scicode Tool Calling Agent / Claude-3.7 Sonnet (February 2025) 3.08 Verified 2026-05-27
20 Scicode Zero Shot Agent / Gemini 2.0 Flash (February 2025) 1.54 Verified 2026-05-27
21 Scicode Tool Calling Agent / Gemini 2.0 Flash (February 2025) 1.54 Verified 2026-05-27
22 Scicode Tool Calling Agent / GPT-4.1 (April 2025) 1.54 Verified 2026-05-27
23 HAL Generalist Agent / GPT-4.1 (April 2025) 1.54 Verified 2026-05-27
24 HAL Generalist Agent / o4-mini High (April 2025) 1.54 Verified 2026-05-27
25 Scicode Tool Calling Agent / Claude Sonnet 4.5 High (September 2025) 1.54 Verified 2026-05-27
26 Scicode Zero Shot Agent / DeepSeek R1 (May 2025) 0 Verified 2026-05-27
27 Scicode Zero Shot Agent / Claude-3.7 Sonnet (February 2025) 0 Verified 2026-05-27
28 Scicode Tool Calling Agent / DeepSeek V3 (March 2025) 0 Verified 2026-05-27
29 Scicode Tool Calling Agent / DeepSeek R1 (May 2025) 0 Verified 2026-05-27
30 HAL Generalist Agent / Gemini 2.0 Flash (February 2025) 0 Verified 2026-05-27
31 HAL Generalist Agent / DeepSeek V3 (March 2025) 0 Verified 2026-05-27
32 Scicode Tool Calling Agent / Claude Haiku 4.5 (October 2025) 0 Verified 2026-05-27
33 HAL Generalist Agent / DeepSeek R1 (January 2025) 0 Verified 2026-05-27