UI-Bench
Expert pairwise benchmark evaluating visual design quality of AI text-to-app and website generation tools across generated web interfaces.
10rows
trueskill_ratingprimary metric
2026-05-06sampled
Metadata
Metrics
TrueSkill Rating, 95% CI Lower, 95% CI Upper, Win Rate
| Rank | Subject | TrueSkill Rating | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Orchids | 30.08 | — | Imported | 2026-05-06 |
| 2 | Figma Make | 27.46 | — | Imported | 2026-05-06 |
| 3 | Lovable | 27.14 | — | Imported | 2026-05-06 |
| 4 | Anything | 25.46 | — | Imported | 2026-05-06 |
| 5 | Bolt | 24.44 | — | Imported | 2026-05-06 |
| 6 | Magic Patterns | 24.23 | — | Imported | 2026-05-06 |
| 7 | Same.new | 23.57 | — | Imported | 2026-05-06 |
| 8 | Base44 by Wix | 23.47 | — | Imported | 2026-05-06 |
| 9 | v0 | 22.24 | — | Imported | 2026-05-06 |
| 10 | Replit | 20.95 | — | Imported | 2026-05-06 |
No matching rows.