YC-Bench
Long-horizon agent benchmark where a model acts as CEO of an AI startup for one simulated year through CLI tool use.
11rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Average final funds
| Rank | Subject | Average final funds | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | zai-org/GLM-5.1 | 1510772 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 2 | zai-org/GLM-5 | 1208190 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 3 | deepseek-ai/DeepSeek-V4-Pro | 1066426 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 4 | moonshotai/Kimi-K2.6 | 511137 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 5 | moonshotai/Kimi-K2.5 | 408822 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 6 | zai-org/GLM-4.7 | 398410 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 7 | MiniMaxAI/MiniMax-M2.5 | 230465 | — | Imported | 2026-05-06 |
| 8 | deepseek-ai/DeepSeek-V3.2 | 125263 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 9 | Qwen/Qwen3.5-397B-A17B | 90787 | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-06 |
| 10 | arcee-ai/Trinity-Large-Thinking | 32667 | Trinity Large Thinking arcee-ai-trinity-large-thinking | Imported | 2026-05-06 |
| 11 | Qwen/Qwen3.5-122B-A10B | 0 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Imported | 2026-05-06 |
No matching rows.