InferenceBench
Benchmark for autonomous CLI agents optimizing OpenAI-compatible LLM inference servers under a fixed one-H100, two-hour budget, with quality and integrity gates and scenario-specific speedup metrics.
22rows
aggregate_speedupprimary metric
2026-05-20sampled
Metadata
Metrics
Aggregate Speedup, Aggregate SEM (lower is better), Prefill Latency, Prefill SEM (lower is better), Decode Latency, Decode SEM (lower is better), Throughput, Throughput SEM (lower is better), All-in-one, All-in-one SEM (lower is better)
| Rank | Subject | Aggregate Speedup | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | SMAC3 (search, 2 h vLLM) | 11.53x | — | Imported | 2026-05-20 |
| 2 | TPE (search, 2 h vLLM) | 11.25x | — | Imported | 2026-05-20 |
| 3 | Random (search, 2 h vLLM) | 10.20x | — | Imported | 2026-05-20 |
| 4 | Claude Sonnet 4.6 / Claude Code | 8.08x | — | Imported | 2026-05-20 |
| 5 | GLM-5 / OpenCode | 6.20x | — | Imported | 2026-05-20 |
| 6 | Gemini 3.1 Pro / OpenCode | 6.16x | — | Imported | 2026-05-20 |
| 7 | GPT-5.3 Codex (High) / Codex CLI | 5.48x | — | Imported | 2026-05-20 |
| 8 | GPT-5.4 (High) / Codex CLI | 5.08x | — | Imported | 2026-05-20 |
| 9 | GPT-5.3 Codex (Medium) / Codex CLI | 4.86x | — | Imported | 2026-05-20 |
| 10 | GPT-5.5 (High) / Codex CLI | 4.22x | — | Imported | 2026-05-20 |
| 11 | vLLM Default | 4.05x | — | Imported | 2026-05-20 |
| 12 | SGLang Default | 3.92x | — | Imported | 2026-05-20 |
| 13 | Claude Opus 4.6 / Claude Code | 3.89x | — | Imported | 2026-05-20 |
| 14 | GPT-5.2 / Codex CLI | 3.82x | — | Imported | 2026-05-20 |
| 15 | GPT-5.1 Codex Max / Codex CLI | 3.54x | — | Imported | 2026-05-20 |
| 16 | Claude Opus 4.5 / Claude Code | 3.37x | — | Imported | 2026-05-20 |
| 17 | HF TGI Default | 3.30x | — | Imported | 2026-05-20 |
| 18 | Claude Sonnet 4.5 / Claude Code | 2.96x | — | Imported | 2026-05-20 |
| 19 | Claude Opus 4.7 / Claude Code | 2.25x | — | Imported | 2026-05-20 |
| 20 | GPT-5.2 Codex / Codex CLI | 1.55x | — | Imported | 2026-05-20 |
| 21 | Claude Haiku 4.5 / Claude Code | 1.24x | — | Imported | 2026-05-20 |
| 22 | PyTorch Baseline | 1.00x | — | Imported | 2026-05-20 |
No matching rows.