ts-bench

Benchmark CLI for comparing AI coding agents on TypeScript workloads, currently publishing pass/fail results for agent/model runs on Exercism-style TypeScript tasks.

8rows
success_rateprimary metric
2026-05-06sampled

Metadata

Metrics

Success Rate, Problems Solved, Total Problems, Avg Execution Time (lower is better), Total Execution Time (lower is better)

Latest Results

Rows are imported from ts-bench public latest-results.json. Source display names are preserved; the upstream docs describe these numbers as directional/community-operational rather than statistically controlled.

Rank Subject Success Rate Model Match Provenance Sampled
1 Codex CLI / gpt-5.4 60 Imported 2026-05-06
2 Claude Code / claude-sonnet-4-6 60 Imported 2026-05-06
3 Gemini CLI / gemini-3.1-pro-preview 60 Imported 2026-05-06
4 Gemini CLI / gemini-2.5-flash 40 Imported 2026-05-06
5 Claude Code / claude-opus-4-6 20 Imported 2026-05-06
6 Claude Code / claude-haiku-4-5 20 Imported 2026-05-06
7 Gemini CLI / gemini-3-flash-preview 20 Imported 2026-05-06
8 Codex CLI / gpt-5.4-mini 0 Imported 2026-05-06