DedeuceBench
LLM-agent active-learning benchmark with budgeted search results, trap-free rate, query usage, and token-count telemetry.
4rows
score100primary metric
2026-05-06sampled
Metadata
Metrics
Score100, Success@Budget, TrapFreeRate, EffSucc, QueriesUsed (lower is better), BudgetLeft, TokensIn (lower is better), TokensOut (lower is better), TokensTotal (lower is better)
| Rank | Subject | Score100 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | openai:gpt-5-mini | 95.31 | — | Imported | 2026-05-06 |
| 2 | openai:gpt-5-nano | 50 | — | Imported | 2026-05-06 |
| 3 | openai:gpt-5-mini | 39.06 | — | Imported | 2026-05-06 |
| 4 | openai:gpt-5-nano | 1.56 | — | Imported | 2026-05-06 |
No matching rows.