OMLAB Open Agent Math Leaderboard

Open Agent Leaderboard math-reasoning track comparing prompting and agent algorithms across GSM8K, AQuA, and MATH-500 with score and cost metrics.

66rows
avg_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Avg Score, gsm8k Score, AQuA Score, MATH-500 Score, gsm8k Cost (lower is better), AQuA Cost (lower is better), MATH-500 Cost (lower is better)

Latest Results

Rows are parsed from OMLAB's public Open Agent Leaderboard math results CSV. Algorithm and LLM display strings are preserved as an agent configuration.

Rank Subject Avg Score Model Match Provenance Sampled
1 SC-CoT + Qwen2.5-72B-Instruct 86.67 Imported 2026-05-06
2 CoT + Qwen2.5-72B-Instruct 86.43 Imported 2026-05-06
3 SC-CoT + gpt-4o 85.07 Imported 2026-05-06
4 SC-CoT + Llama-3.3-70B-Instruct 84.09 Imported 2026-05-06
5 CoT + Llama-3.3-70B-Instruct 82.86 Imported 2026-05-06
6 CoT + gpt-4o 81.59 Imported 2026-05-06
7 IO + Llama-3.3-70B-Instruct 81.45 Imported 2026-05-06
8 SC-CoT + Qwen2.5-7B-Instruct 80.57 Imported 2026-05-06
9 IO + Qwen2.5-72B-Instruct 80.34 Imported 2026-05-06
10 CoT + Qwen2.5-7B-Instruct 78.73 Imported 2026-05-06
11 SC-CoT + Doubao-lite-32k 77.92 Imported 2026-05-06
12 ReAct-Pro* + Llama-3.3-70B-Instruct 77.12 Imported 2026-05-06
13 CoT + Doubao-lite-32k 77 Imported 2026-05-06
14 ReAct-Pro* + Qwen2.5-72B-Instruct 74.43 Imported 2026-05-06
15 PoT + Qwen2.5-72B-Instruct 71.58 Imported 2026-05-06
16 PoT + gpt-4o 71.50 Imported 2026-05-06
17 ReAct-Pro* + Doubao-lite-32k 70.12 Imported 2026-05-06
18 ReAct-Pro* + Qwen2.5-7B-Instruct 68.69 Imported 2026-05-06
19 IO + gpt-4o 68.60 Imported 2026-05-06
20 IO + Qwen2.5-7B-Instruct 65.13 Imported 2026-05-06
21 PoT + Llama-3.3-70B-Instruct 65.07 Imported 2026-05-06
22 CoT + deepseek-r1:1.5b 63.90 Imported 2026-05-06
23 IO + Doubao-lite-32k 62.85 Imported 2026-05-06
24 PoT + Doubao-lite-32k 61.29 Imported 2026-05-06
25 ToT + Qwen2.5-72B-Instruct 60.26 Imported 2026-05-06
26 CoT + gpt-3.5-turbo 59.84 Imported 2026-05-06
27 CoT + Internllm2_5-7B 59.02 Imported 2026-05-06
28 IO + deepseek-r1:1.5b 58.95 Imported 2026-05-06
29 ToT + Llama-3.3-70B-Instruct 58.79 Imported 2026-05-06
30 ToT + gpt-4o 58.61 Imported 2026-05-06
31 ReAct-Pro* + gpt-4o 58.26 Imported 2026-05-06
32 SC-CoT + deepseek-r1:1.5b 57.91 Imported 2026-05-06
33 SC-CoT + gpt-3.5-turbo 56.25 Imported 2026-05-06
34 PoT + Qwen2.5-7B-Instruct 55.51 Imported 2026-05-06
35 PoT + gpt-3.5-turbo 55.04 Imported 2026-05-06
36 ReAct-Pro* + gpt-3.5-turbo 54.43 Imported 2026-05-06
37 CoT + Llama-3.1-8B-Instruct 53.96 Imported 2026-05-06
38 ReAct-Pro* + Llama-3.1-8B-Instruct 50.70 Imported 2026-05-06
39 IO + Llama-3.1-8B-Instruct 48.98 Imported 2026-05-06
40 ToT + gpt-3.5-turbo 44.94 Imported 2026-05-06
41 SC-CoT + Llama-3.1-8B-Instruct 44.54 Imported 2026-05-06
42 ToT + Qwen2.5-7B-Instruct 42.52 Imported 2026-05-06
43 ToT + Llama-3.1-8B-Instruct 41.97 Imported 2026-05-06
44 ReAct-Pro* + deepseek-r1:1.5b 38.22 Imported 2026-05-06
45 CoT + Qwen2-1.5B-Instruct 37.08 Imported 2026-05-06
46 PoT + Llama-3.1-8B-Instruct 33.56 Imported 2026-05-06
47 IO + gpt-3.5-turbo 31.34 Imported 2026-05-06
48 SC-CoT + Internllm2_5-7B 30.81 Imported 2026-05-06
49 PoT + Internllm2_5-7B 29.94 Imported 2026-05-06
50 ReAct-Pro* + Internllm2_5-7B 29.75 Imported 2026-05-06
51 ToT + Doubao-lite-32k 28.10 Imported 2026-05-06
52 IO + Internllm2_5-7B 27.35 Imported 2026-05-06
53 CoT + Qwen2-0.5B-Instruct 25.07 Imported 2026-05-06
54 PoT + deepseek-r1:1.5b 22.54 Imported 2026-05-06
55 ReAct-Pro* + Qwen2-1.5B-Instruct 19.55 Imported 2026-05-06
56 ToT + Internllm2_5-7B 18.96 Imported 2026-05-06
57 IO + Qwen2-1.5B-Instruct 17.60 Imported 2026-05-06
58 ToT + Qwen2-1.5B-Instruct 17.31 Imported 2026-05-06
59 PoT + Qwen2-1.5B-Instruct 16.67 Imported 2026-05-06
60 ToT + deepseek-r1:1.5b 16.11 Imported 2026-05-06
61 IO + Qwen2-0.5B-Instruct 14.83 Imported 2026-05-06
62 ReAct-Pro* + Qwen2-0.5B-Instruct 10.76 Imported 2026-05-06
63 ToT + Qwen2-0.5B-Instruct 9.97 Imported 2026-05-06
64 PoT + Qwen2-0.5B-Instruct 8.98 Imported 2026-05-06
65 SC-CoT + Qwen2-0.5B-Instruct 7.90 Imported 2026-05-06
66 SC-CoT + Qwen2-1.5B-Instruct 6.94 Imported 2026-05-06