USACO (HAL)

HAL's cost-aware agent leaderboard for USACO competitive programming tasks.

13rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Cost (USD) (lower is better), Runs

Latest Results

Rows are parsed from the public HAL static leaderboard table. Source scaffold/model display names are preserved; score is the table's Accuracy percentage.

Rank Subject Accuracy Model Match Provenance Sampled
1 USACO Episodic + Semantic / GPT-5 Medium (August 2025) 69.71 Verified 2026-05-27
2 USACO Episodic + Semantic / o4-mini High (April 2025) 57.98 Verified 2026-05-27
3 USACO Episodic + Semantic / Claude Opus 4.1 High (August 2025) 51.47 Verified 2026-05-27
4 USACO Episodic + Semantic / Claude Opus 4.1 (August 2025) 48.21 Verified 2026-05-27
5 USACO Episodic + Semantic / o3 Medium (April 2025) 46.25 Verified 2026-05-27
6 USACO Episodic + Semantic / GPT-4.1 (April 2025) 44.95 Verified 2026-05-27
7 USACO Episodic + Semantic / DeepSeek V3 (March 2025) 39.09 Verified 2026-05-27
8 USACO Episodic + Semantic / DeepSeek R1 (January 2025) 38.11 Verified 2026-05-27
9 USACO Episodic + Semantic / o4-mini Low (April 2025) 30.94 Verified 2026-05-27
10 USACO Episodic + Semantic / Claude-3.7 Sonnet (February 2025) 29.32 Verified 2026-05-27
11 USACO Episodic + Semantic / Gemini 2.0 Flash (February 2025) 27.04 Verified 2026-05-27
12 USACO Episodic + Semantic / Claude-3.7 Sonnet High (February 2025) 26.71 Verified 2026-05-27
13 HAL Generalist Agent / GPT-4.1 (April 2025) 25.41 Verified 2026-05-27