ts-bench
Benchmark CLI for comparing AI coding agents on TypeScript workloads, currently publishing pass/fail results for agent/model runs on Exercism-style TypeScript tasks.
8rows
success_rateprimary metric
2026-05-06sampled
Metadata
Metrics
Success Rate, Problems Solved, Total Problems, Avg Execution Time (lower is better), Total Execution Time (lower is better)
| Rank | Subject | Success Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Codex CLI / gpt-5.4 | 60 | — | Imported | 2026-05-06 |
| 2 | Claude Code / claude-sonnet-4-6 | 60 | — | Imported | 2026-05-06 |
| 3 | Gemini CLI / gemini-3.1-pro-preview | 60 | — | Imported | 2026-05-06 |
| 4 | Gemini CLI / gemini-2.5-flash | 40 | — | Imported | 2026-05-06 |
| 5 | Claude Code / claude-opus-4-6 | 20 | — | Imported | 2026-05-06 |
| 6 | Claude Code / claude-haiku-4-5 | 20 | — | Imported | 2026-05-06 |
| 7 | Gemini CLI / gemini-3-flash-preview | 20 | — | Imported | 2026-05-06 |
| 8 | Codex CLI / gpt-5.4-mini | 0 | — | Imported | 2026-05-06 |
No matching rows.