UniREditBench
Unified reasoning-based image editing benchmark covering real-world edits and game-world reasoning tasks.
14rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, Real-World-Overall, Viewpoint Transformation, Material Modification, Pose Adjustment, Temporal Evolution, Structural Integrity Change, Motion State Change, Spatial Arrangement, Mechanical Reaction, Medium Interaction, Game-World-Overall, 3D Reconstruction, Space Invader, Jewel2, Pacman, Word Search, Tictactoe, Sudoku, Maze, Sokoban
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | UniREdit-Bagel(Ours) | 78.15 | — | Imported | 2026-05-06 |
| 2 | GPT-4o | 73.39 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 3 | Nano Banana | 68.26 | Nano Banana (Gemini 2.5 Flash Image) google-gemini-2.5-flash-image | Imported | 2026-05-06 |
| 4 | Wan2.5 | 61.36 | — | Imported | 2026-05-06 |
| 5 | Qwen-Image-Edit | 56.52 | — | Imported | 2026-05-06 |
| 6 | Seedream4.0 | 55.77 | — | Imported | 2026-05-06 |
| 7 | UniWorld-V2 | 54.87 | — | Imported | 2026-05-06 |
| 8 | DreamOmni2 | 52.81 | — | Imported | 2026-05-06 |
| 9 | Bagel-Think | 50.96 | — | Imported | 2026-05-06 |
| 10 | Step1X-Edit | 50.15 | — | Imported | 2026-05-06 |
| 11 | Lumina-DiMOO | 48.54 | — | Imported | 2026-05-06 |
| 12 | FLUX-Kontext-Pro | 45.77 | — | Imported | 2026-05-06 |
| 13 | OmniGen2 | 43.41 | — | Imported | 2026-05-06 |
| 14 | MagicBrush | 40.77 | — | Imported | 2026-05-06 |
No matching rows.