'Aspect Ratio Bucketing', which image generation AI 'NovelAI' trains at resolutions other than 512 × 512, is released under MIT license



`` Aspect Ratio Bucketing '', which greatly improves the quality of the output image used in AI `` NovelAI '' that automatically generates sentences and images, has been released under the open source software license and MIT license. This technology is intended to solve the problem of generating unnaturally cropped images, which has been recognized as a problem of image generation AI.

GitHub - NovelAI/novelai-aspect-ratio-bucketing: Implementation of aspect ratio bucketing for training generative image models as described in: https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac

https://github.com/NovelAI/novelai-aspect-ratio-bucketing



NovelAI Improvements on Stable Diffusion | by NovelAI | Oct, 2022 | Medium

https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac

``Unnaturally cut images are generated'' is one of the problems with existing image generation models. The reason for this is that although it is trained to generate and output square images, many of the photographs, artworks, and works of art that it learns from are not square.

When training image generation models, it is common to work with multiple training samples at once for GPU efficiency optimization. As a compromise, non-square images have been center-cropped to a square and used for training.

Specific examples are below. It is an image of a knight wearing a crown, but since it is vertically long, the central part is cut off during training. This is also one of the problems, even if 'crown' is included as a tag in the original image, 'crown' has disappeared when used in training. It is said that only a slight improvement was seen even if the central part cut was stopped and random cut was done.



One possible solution is to fit the image onto a fixed-size canvas and mask out the unnecessary parts, but this would result in unnecessary computation during training.

For that reason, 'Aspect Ratio Bucketing' was devised. This method is based on the image of the dataset '256 × 1024 (aspect ratio 0.25)' '320 × 1024 (aspect ratio 0.3125)' '384 × 1024 (aspect ratio 0.375)' '384 × 960 (aspect ratio 0.4)] Prepare multiple buckets such as ``512 × 512 (aspect ratio 1)'', assign images to the closest bucket, and train each bucket without bias.

According to gcem156, who practiced Aspect Ratio Bucketing in Waifu diffusion v1.4, some improvements were seen with respect to margins and vignettes.

I tried a new additional learning method for Waifu diffusion. (Aspect ratio bucketing)|gcem156|note
https://note.com/gcem156/n/n2fd6d96fb36a

in Web Application, Posted by logc_nt