YC-Bench

Long-horizon agent benchmark where a model acts as CEO of an AI startup for one simulated year through CLI tool use.

11rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Average final funds

Latest Results

Rows are ranked by the Hugging Face leaderboard API rank. Model display names are preserved from source modelId values.

Rank Subject Average final funds Model Match Provenance Sampled
1 zai-org/GLM-5.1 1510772 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-06
2 zai-org/GLM-5 1208190 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
3 deepseek-ai/DeepSeek-V4-Pro 1066426 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-06
4 moonshotai/Kimi-K2.6 511137 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-06
5 moonshotai/Kimi-K2.5 408822 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
6 zai-org/GLM-4.7 398410 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
7 MiniMaxAI/MiniMax-M2.5 230465 Imported 2026-05-06
8 deepseek-ai/DeepSeek-V3.2 125263 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
9 Qwen/Qwen3.5-397B-A17B 90787 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-06
10 arcee-ai/Trinity-Large-Thinking 32667 A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-06
11 Qwen/Qwen3.5-122B-A10B 0 Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-06