GAIA (HAL)
HAL's standardized, cost-aware agent leaderboard for GAIA web assistance tasks.
32rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Level 1, Level 2, Level 3, Cost (USD) (lower is better), Runs
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | HAL Generalist Agent / Claude Sonnet 4.5 (September 2025) | 74.55 | — | Verified | 2026-05-27 |
| 2 | HAL Generalist Agent / Claude Sonnet 4.5 High (September 2025) | 70.91 | — | Verified | 2026-05-27 |
| 3 | HAL Generalist Agent / Claude Opus 4.1 High (August 2025) | 68.48 | — | Verified | 2026-05-27 |
| 4 | HAL Generalist Agent / Claude Opus 4 High (May 2025) | 64.85 | — | Verified | 2026-05-27 |
| 5 | HAL Generalist Agent / Claude-3.7 Sonnet High (February 2025) | 64.24 | — | Verified | 2026-05-27 |
| 6 | HAL Generalist Agent / Claude Opus 4.1 (August 2025) | 64.24 | — | Verified | 2026-05-27 |
| 7 | HF Open Deep Research / GPT-5 Medium (August 2025) | 62.8 | — | Verified | 2026-05-27 |
| 8 | HAL Generalist Agent / GPT-5 Medium (August 2025) | 59.39 | — | Verified | 2026-05-27 |
| 9 | HAL Generalist Agent / o4-mini Low (April 2025) | 58.18 | — | Verified | 2026-05-27 |
| 10 | HF Open Deep Research / Claude Opus 4 (May 2025) | 57.58 | — | Verified | 2026-05-27 |
| 11 | HAL Generalist Agent / Claude-3.7 Sonnet (February 2025) | 56.36 | — | Verified | 2026-05-27 |
| 12 | HAL Generalist Agent / Claude Haiku 4.5 (October 2025) | 56.36 | — | Verified | 2026-05-27 |
| 13 | HF Open Deep Research / o4-mini High (April 2025) | 55.76 | — | Verified | 2026-05-27 |
| 14 | HAL Generalist Agent / o4-mini High (April 2025) | 54.55 | — | Verified | 2026-05-27 |
| 15 | HF Open Deep Research / GPT-4.1 (April 2025) | 50.3 | — | Verified | 2026-05-27 |
| 16 | HAL Generalist Agent / GPT-4.1 (April 2025) | 49.7 | — | Verified | 2026-05-27 |
| 17 | HF Open Deep Research / o4-mini Low (April 2025) | 47.88 | — | Verified | 2026-05-27 |
| 18 | HF Open Deep Research / Claude-3.7 Sonnet (February 2025) | 36.97 | — | Verified | 2026-05-27 |
| 19 | HF Open Deep Research / Claude-3.7 Sonnet High (February 2025) | 35.76 | — | Verified | 2026-05-27 |
| 20 | HAL Generalist Agent / Gemini 2.0 Flash (February 2025) | 32.73 | — | Verified | 2026-05-27 |
| 21 | HF Open Deep Research / o3 Medium (April 2025) | 32.73 | — | Verified | 2026-05-27 |
| 22 | HF Open Deep Research / Claude Sonnet 4.5 (September 2025) | 30.91 | — | Verified | 2026-05-27 |
| 23 | HF Open Deep Research / Claude Sonnet 4.5 High (September 2025) | 30.91 | — | Verified | 2026-05-27 |
| 24 | HAL Generalist Agent / DeepSeek R1 (January 2025) | 30.3 | — | Verified | 2026-05-27 |
| 25 | HAL Generalist Agent / Claude Opus 4 (May 2025) | 30.3 | — | Verified | 2026-05-27 |
| 26 | HAL Generalist Agent / DeepSeek V3 (March 2025) | 29.39 | — | Verified | 2026-05-27 |
| 27 | HF Open Deep Research / DeepSeek V3 (March 2025) | 28.48 | — | Verified | 2026-05-27 |
| 28 | HF Open Deep Research / Claude Opus 4.1 (August 2025) | 28.48 | — | Verified | 2026-05-27 |
| 29 | HAL Generalist Agent / o3 Medium (April 2025) | 28.48 | — | Verified | 2026-05-27 |
| 30 | HF Open Deep Research / Claude Opus 4.1 High (August 2025) | 25.45 | — | Verified | 2026-05-27 |
| 31 | HF Open Deep Research / DeepSeek R1 (January 2025) | 24.85 | — | Verified | 2026-05-27 |
| 32 | HF Open Deep Research / Gemini 2.0 Flash (February 2025) | 19.39 | — | Verified | 2026-05-27 |
No matching rows.