USACO (HAL)
HAL's cost-aware agent leaderboard for USACO competitive programming tasks.
13rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Cost (USD) (lower is better), Runs
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | USACO Episodic + Semantic / GPT-5 Medium (August 2025) | 69.71 | — | Verified | 2026-05-27 |
| 2 | USACO Episodic + Semantic / o4-mini High (April 2025) | 57.98 | — | Verified | 2026-05-27 |
| 3 | USACO Episodic + Semantic / Claude Opus 4.1 High (August 2025) | 51.47 | — | Verified | 2026-05-27 |
| 4 | USACO Episodic + Semantic / Claude Opus 4.1 (August 2025) | 48.21 | — | Verified | 2026-05-27 |
| 5 | USACO Episodic + Semantic / o3 Medium (April 2025) | 46.25 | — | Verified | 2026-05-27 |
| 6 | USACO Episodic + Semantic / GPT-4.1 (April 2025) | 44.95 | — | Verified | 2026-05-27 |
| 7 | USACO Episodic + Semantic / DeepSeek V3 (March 2025) | 39.09 | — | Verified | 2026-05-27 |
| 8 | USACO Episodic + Semantic / DeepSeek R1 (January 2025) | 38.11 | — | Verified | 2026-05-27 |
| 9 | USACO Episodic + Semantic / o4-mini Low (April 2025) | 30.94 | — | Verified | 2026-05-27 |
| 10 | USACO Episodic + Semantic / Claude-3.7 Sonnet (February 2025) | 29.32 | — | Verified | 2026-05-27 |
| 11 | USACO Episodic + Semantic / Gemini 2.0 Flash (February 2025) | 27.04 | — | Verified | 2026-05-27 |
| 12 | USACO Episodic + Semantic / Claude-3.7 Sonnet High (February 2025) | 26.71 | — | Verified | 2026-05-27 |
| 13 | HAL Generalist Agent / GPT-4.1 (April 2025) | 25.41 | — | Verified | 2026-05-27 |
No matching rows.