OMLAB Open Agent Math Leaderboard
Open Agent Leaderboard math-reasoning track comparing prompting and agent algorithms across GSM8K, AQuA, and MATH-500 with score and cost metrics.
66rows
avg_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Avg Score, gsm8k Score, AQuA Score, MATH-500 Score, gsm8k Cost (lower is better), AQuA Cost (lower is better), MATH-500 Cost (lower is better)
| Rank | Subject | Avg Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | SC-CoT + Qwen2.5-72B-Instruct | 86.67 | — | Imported | 2026-05-06 |
| 2 | CoT + Qwen2.5-72B-Instruct | 86.43 | — | Imported | 2026-05-06 |
| 3 | SC-CoT + gpt-4o | 85.07 | — | Imported | 2026-05-06 |
| 4 | SC-CoT + Llama-3.3-70B-Instruct | 84.09 | — | Imported | 2026-05-06 |
| 5 | CoT + Llama-3.3-70B-Instruct | 82.86 | — | Imported | 2026-05-06 |
| 6 | CoT + gpt-4o | 81.59 | — | Imported | 2026-05-06 |
| 7 | IO + Llama-3.3-70B-Instruct | 81.45 | — | Imported | 2026-05-06 |
| 8 | SC-CoT + Qwen2.5-7B-Instruct | 80.57 | — | Imported | 2026-05-06 |
| 9 | IO + Qwen2.5-72B-Instruct | 80.34 | — | Imported | 2026-05-06 |
| 10 | CoT + Qwen2.5-7B-Instruct | 78.73 | — | Imported | 2026-05-06 |
| 11 | SC-CoT + Doubao-lite-32k | 77.92 | — | Imported | 2026-05-06 |
| 12 | ReAct-Pro* + Llama-3.3-70B-Instruct | 77.12 | — | Imported | 2026-05-06 |
| 13 | CoT + Doubao-lite-32k | 77 | — | Imported | 2026-05-06 |
| 14 | ReAct-Pro* + Qwen2.5-72B-Instruct | 74.43 | — | Imported | 2026-05-06 |
| 15 | PoT + Qwen2.5-72B-Instruct | 71.58 | — | Imported | 2026-05-06 |
| 16 | PoT + gpt-4o | 71.50 | — | Imported | 2026-05-06 |
| 17 | ReAct-Pro* + Doubao-lite-32k | 70.12 | — | Imported | 2026-05-06 |
| 18 | ReAct-Pro* + Qwen2.5-7B-Instruct | 68.69 | — | Imported | 2026-05-06 |
| 19 | IO + gpt-4o | 68.60 | — | Imported | 2026-05-06 |
| 20 | IO + Qwen2.5-7B-Instruct | 65.13 | — | Imported | 2026-05-06 |
| 21 | PoT + Llama-3.3-70B-Instruct | 65.07 | — | Imported | 2026-05-06 |
| 22 | CoT + deepseek-r1:1.5b | 63.90 | — | Imported | 2026-05-06 |
| 23 | IO + Doubao-lite-32k | 62.85 | — | Imported | 2026-05-06 |
| 24 | PoT + Doubao-lite-32k | 61.29 | — | Imported | 2026-05-06 |
| 25 | ToT + Qwen2.5-72B-Instruct | 60.26 | — | Imported | 2026-05-06 |
| 26 | CoT + gpt-3.5-turbo | 59.84 | — | Imported | 2026-05-06 |
| 27 | CoT + Internllm2_5-7B | 59.02 | — | Imported | 2026-05-06 |
| 28 | IO + deepseek-r1:1.5b | 58.95 | — | Imported | 2026-05-06 |
| 29 | ToT + Llama-3.3-70B-Instruct | 58.79 | — | Imported | 2026-05-06 |
| 30 | ToT + gpt-4o | 58.61 | — | Imported | 2026-05-06 |
| 31 | ReAct-Pro* + gpt-4o | 58.26 | — | Imported | 2026-05-06 |
| 32 | SC-CoT + deepseek-r1:1.5b | 57.91 | — | Imported | 2026-05-06 |
| 33 | SC-CoT + gpt-3.5-turbo | 56.25 | — | Imported | 2026-05-06 |
| 34 | PoT + Qwen2.5-7B-Instruct | 55.51 | — | Imported | 2026-05-06 |
| 35 | PoT + gpt-3.5-turbo | 55.04 | — | Imported | 2026-05-06 |
| 36 | ReAct-Pro* + gpt-3.5-turbo | 54.43 | — | Imported | 2026-05-06 |
| 37 | CoT + Llama-3.1-8B-Instruct | 53.96 | — | Imported | 2026-05-06 |
| 38 | ReAct-Pro* + Llama-3.1-8B-Instruct | 50.70 | — | Imported | 2026-05-06 |
| 39 | IO + Llama-3.1-8B-Instruct | 48.98 | — | Imported | 2026-05-06 |
| 40 | ToT + gpt-3.5-turbo | 44.94 | — | Imported | 2026-05-06 |
| 41 | SC-CoT + Llama-3.1-8B-Instruct | 44.54 | — | Imported | 2026-05-06 |
| 42 | ToT + Qwen2.5-7B-Instruct | 42.52 | — | Imported | 2026-05-06 |
| 43 | ToT + Llama-3.1-8B-Instruct | 41.97 | — | Imported | 2026-05-06 |
| 44 | ReAct-Pro* + deepseek-r1:1.5b | 38.22 | — | Imported | 2026-05-06 |
| 45 | CoT + Qwen2-1.5B-Instruct | 37.08 | — | Imported | 2026-05-06 |
| 46 | PoT + Llama-3.1-8B-Instruct | 33.56 | — | Imported | 2026-05-06 |
| 47 | IO + gpt-3.5-turbo | 31.34 | — | Imported | 2026-05-06 |
| 48 | SC-CoT + Internllm2_5-7B | 30.81 | — | Imported | 2026-05-06 |
| 49 | PoT + Internllm2_5-7B | 29.94 | — | Imported | 2026-05-06 |
| 50 | ReAct-Pro* + Internllm2_5-7B | 29.75 | — | Imported | 2026-05-06 |
| 51 | ToT + Doubao-lite-32k | 28.10 | — | Imported | 2026-05-06 |
| 52 | IO + Internllm2_5-7B | 27.35 | — | Imported | 2026-05-06 |
| 53 | CoT + Qwen2-0.5B-Instruct | 25.07 | — | Imported | 2026-05-06 |
| 54 | PoT + deepseek-r1:1.5b | 22.54 | — | Imported | 2026-05-06 |
| 55 | ReAct-Pro* + Qwen2-1.5B-Instruct | 19.55 | — | Imported | 2026-05-06 |
| 56 | ToT + Internllm2_5-7B | 18.96 | — | Imported | 2026-05-06 |
| 57 | IO + Qwen2-1.5B-Instruct | 17.60 | — | Imported | 2026-05-06 |
| 58 | ToT + Qwen2-1.5B-Instruct | 17.31 | — | Imported | 2026-05-06 |
| 59 | PoT + Qwen2-1.5B-Instruct | 16.67 | — | Imported | 2026-05-06 |
| 60 | ToT + deepseek-r1:1.5b | 16.11 | — | Imported | 2026-05-06 |
| 61 | IO + Qwen2-0.5B-Instruct | 14.83 | — | Imported | 2026-05-06 |
| 62 | ReAct-Pro* + Qwen2-0.5B-Instruct | 10.76 | — | Imported | 2026-05-06 |
| 63 | ToT + Qwen2-0.5B-Instruct | 9.97 | — | Imported | 2026-05-06 |
| 64 | PoT + Qwen2-0.5B-Instruct | 8.98 | — | Imported | 2026-05-06 |
| 65 | SC-CoT + Qwen2-0.5B-Instruct | 7.90 | — | Imported | 2026-05-06 |
| 66 | SC-CoT + Qwen2-1.5B-Instruct | 6.94 | — | Imported | 2026-05-06 |
No matching rows.