AgentQuest
AgentQuest: Evaluates autonomous agent performance on multi-step tasks requiring planning, state tracking, tool use, and recovery.
6rows
success_rateprimary metric
2026-05-28sampled
Metadata
Metrics
Success Rate, Steps (lower is better), Progress Rate, Repetition Rate (lower is better)
| Rank | Subject | Success Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Modified LangChain chat agent + GPT-4 on ALFWorld | 93% | — | Imported | 2026-05-28 |
| 2 | LangChain chat agent + GPT-4 on ALFWorld | 86% | — | Imported | 2026-05-28 |
| 3 | Modified LangChain chat agent + GPT-4 on Mastermind | 60% | — | Imported | 2026-05-28 |
| 4 | LangChain chat agent + GPT-4 on Mastermind | 47% | — | Imported | 2026-05-28 |
| 5 | LangChain chat agent + GPT-4 on Lateral Thinking Puzzles | 20% | — | Imported | 2026-05-28 |
| 6 | LangChain chat agent + GPT-4 on Sudoku | 0% | — | Imported | 2026-05-28 |
No matching rows.