This research introduces Complex-Edit, a new benchmark for evaluating how well image editing models follow instructions with varying levels of complexity. The benchmark was created using GPT-4o to generate atomic editing tasks, which were then simplified and combined into more intricate instructions. The authors also present a suite of metrics and a VLM-based evaluation system to assess instruction following, identity preservation, and perceptual quality of edited images. Experiments using Complex-Edit reveal that open-source models lag behind proprietary ones, especially with more complex instructions, and that increased complexity can negatively impact the retention of original image elements and overall aesthetic quality. The study further examines sequential editing and a Best-of-N strategy as potential methods for handling complex edits and notes a tendency for models trained on synthetic data, including advanced models, to produce increasingly synthetic-looking results with higher instruction complexity.

Podden och tillhörande omslagsbild pÄ den hÀr sidan tillhör Neural Intelligence Network. InnehÄllet i podden Àr skapat av Neural Intelligence Network och inte av, eller tillsammans med, Poddtoppen.

Neural intel Pod

Complex Instruction-Based Image Editing Benchmark

00:00