INVESTORBENCH

Financial decision-making benchmark for investment agents, evaluating portfolio or trading decisions rather than only financial QA.

14rows
cumulative_returnprimary metric
2026-05-27sampled

Metadata

Metrics

Average stock cumulative return, Average stock Sharpe ratio, Average stock annualized volatility (lower is better), Average stock maximum drawdown (lower is better)

Latest Results

Rows are transcribed from public INVESTORBENCH arXiv Table 2, using the Average columns across seven stock-trading tasks. Primary score is average cumulative return; risk metrics are preserved.

Rank Subject Average stock cumulative return Model Match Provenance Sampled
1 Qwen2.5-72B-Instruct 46.153% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-27
2 GPT-4 43.696% GPT-4
openai-gpt-4
Imported 2026-05-27
3 GPT-4o 39.031% GPT-4o
openai-gpt-4o
Imported 2026-05-27
4 Llama-3.1-70B-Instruct 38.946% Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-27
5 Yi-1.5-34B-Chat 37.966% Imported 2026-05-27
6 Buy & Hold 34.099% Imported 2026-05-27
7 Qwen-2.5-Instruct-7B 29.515% Imported 2026-05-27
8 DeepSeek-V2-Lite (15.7B) 28.745% Imported 2026-05-27
9 DeepSeek-67B-Chat 26.941% Imported 2026-05-27
10 Llama-3.1-8B-Instruct 25.463% Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-27
11 GPT-o1-preview 25.057% Imported 2026-05-27
12 Yi-1.5-9B-Chat 22.913% Imported 2026-05-27
13 Qwen2.5-32B-Instruct 20.884% Imported 2026-05-27
14 Palmyra-Fin-70B -0.453% Imported 2026-05-27