Scicode (HAL)
HAL's cost-aware agent leaderboard for Scicode scientific programming tasks.
33rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Cost (USD) (lower is better), Runs
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Scicode Zero Shot Agent / o4-mini Low (April 2025) | 9.23 | — | Verified | 2026-05-27 |
| 2 | Scicode Tool Calling Agent / o3 Medium (April 2025) | 9.23 | — | Verified | 2026-05-27 |
| 3 | Scicode Tool Calling Agent / Claude Opus 4.1 (August 2025) | 7.69 | — | Verified | 2026-05-27 |
| 4 | Scicode Tool Calling Agent / Claude Opus 4.1 High (August 2025) | 6.92 | — | Verified | 2026-05-27 |
| 5 | Scicode Zero Shot Agent / GPT-4.1 (April 2025) | 6.15 | — | Verified | 2026-05-27 |
| 6 | Scicode Zero Shot Agent / o4-mini High (April 2025) | 6.15 | — | Verified | 2026-05-27 |
| 7 | HAL Generalist Agent / o4-mini Low (April 2025) | 6.15 | — | Verified | 2026-05-27 |
| 8 | Scicode Tool Calling Agent / GPT-5 Medium (August 2025) | 6.15 | — | Verified | 2026-05-27 |
| 9 | Scicode Zero Shot Agent / o3 Medium (April 2025) | 4.62 | — | Verified | 2026-05-27 |
| 10 | Scicode Tool Calling Agent / o4-mini Low (April 2025) | 4.62 | — | Verified | 2026-05-27 |
| 11 | Scicode Tool Calling Agent / o4-mini High (April 2025) | 4.62 | — | Verified | 2026-05-27 |
| 12 | Scicode Tool Calling Agent / Claude-3.7 Sonnet High (February 2025) | 4.62 | — | Verified | 2026-05-27 |
| 13 | Scicode Zero Shot Agent / DeepSeek V3 (March 2025) | 3.08 | — | Verified | 2026-05-27 |
| 14 | Scicode Zero Shot Agent / Claude-3.7 Sonnet High (February 2025) | 3.08 | — | Verified | 2026-05-27 |
| 15 | HAL Generalist Agent / Claude-3.7 Sonnet (February 2025) | 3.08 | — | Verified | 2026-05-27 |
| 16 | HAL Generalist Agent / o3 Medium (April 2025) | 3.08 | — | Verified | 2026-05-27 |
| 17 | Scicode Tool Calling Agent / Claude Sonnet 4.5 (September 2025) | 3.08 | — | Verified | 2026-05-27 |
| 18 | HAL Generalist Agent / Claude-3.7 Sonnet High (February 2025) | 3.08 | — | Verified | 2026-05-27 |
| 19 | Scicode Tool Calling Agent / Claude-3.7 Sonnet (February 2025) | 3.08 | — | Verified | 2026-05-27 |
| 20 | Scicode Zero Shot Agent / Gemini 2.0 Flash (February 2025) | 1.54 | — | Verified | 2026-05-27 |
| 21 | Scicode Tool Calling Agent / Gemini 2.0 Flash (February 2025) | 1.54 | — | Verified | 2026-05-27 |
| 22 | Scicode Tool Calling Agent / GPT-4.1 (April 2025) | 1.54 | — | Verified | 2026-05-27 |
| 23 | HAL Generalist Agent / GPT-4.1 (April 2025) | 1.54 | — | Verified | 2026-05-27 |
| 24 | HAL Generalist Agent / o4-mini High (April 2025) | 1.54 | — | Verified | 2026-05-27 |
| 25 | Scicode Tool Calling Agent / Claude Sonnet 4.5 High (September 2025) | 1.54 | — | Verified | 2026-05-27 |
| 26 | Scicode Zero Shot Agent / DeepSeek R1 (May 2025) | 0 | — | Verified | 2026-05-27 |
| 27 | Scicode Zero Shot Agent / Claude-3.7 Sonnet (February 2025) | 0 | — | Verified | 2026-05-27 |
| 28 | Scicode Tool Calling Agent / DeepSeek V3 (March 2025) | 0 | — | Verified | 2026-05-27 |
| 29 | Scicode Tool Calling Agent / DeepSeek R1 (May 2025) | 0 | — | Verified | 2026-05-27 |
| 30 | HAL Generalist Agent / Gemini 2.0 Flash (February 2025) | 0 | — | Verified | 2026-05-27 |
| 31 | HAL Generalist Agent / DeepSeek V3 (March 2025) | 0 | — | Verified | 2026-05-27 |
| 32 | Scicode Tool Calling Agent / Claude Haiku 4.5 (October 2025) | 0 | — | Verified | 2026-05-27 |
| 33 | HAL Generalist Agent / DeepSeek R1 (January 2025) | 0 | — | Verified | 2026-05-27 |
No matching rows.