UI-Bench

Expert pairwise benchmark evaluating visual design quality of AI text-to-app and website generation tools across generated web interfaces.

10rows
trueskill_ratingprimary metric
2026-05-06sampled

Metadata

Metrics

TrueSkill Rating, 95% CI Lower, 95% CI Upper, Win Rate

Latest Results

Rows ranked by highest TrueSkill rating.

Rank Subject TrueSkill Rating Model Match Provenance Sampled
1 Orchids 30.08 Imported 2026-05-06
2 Figma Make 27.46 Imported 2026-05-06
3 Lovable 27.14 Imported 2026-05-06
4 Anything 25.46 Imported 2026-05-06
5 Bolt 24.44 Imported 2026-05-06
6 Magic Patterns 24.23 Imported 2026-05-06
7 Same.new 23.57 Imported 2026-05-06
8 Base44 by Wix 23.47 Imported 2026-05-06
9 v0 22.24 Imported 2026-05-06
10 Replit 20.95 Imported 2026-05-06