NVIDIA and other research teams release 'Sana,' an AI model that can automatically generate images with a resolution of up to 4096 x 4096 within seconds
A research team from NVIDIA, Massachusetts Institute of Technology (MIT), and Seika University has announced ' Sana ,' an image generation AI that can generate images with a maximum resolution of 4096 x 4096 within a few seconds.
[2410.10629] SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Sana
https://nvlabs.github.io/Sana/
Below is an example of an image actually created with Sana. With the prompt 'astronaut in a jungle, cold color palette, muted colors, detailed, 8k', you can generate an image like this.
Below is the image generated by the prompt 'a cyberpunk cat with a neon sign that says 'SANA'.
When I entered the prompt 'portrait photo of a girl, photograph, highly detailed face, depth of field,' a realistic image of a person was generated.
According to the Sana development team, unlike conventional autoencoders that can only compress images by 8 times, Sana trains an autoencoder that can compress images by up to 32 times, effectively reducing the number of potential tokens while efficiently training and generating ultra-high-resolution images with 4K resolution.
In addition, the decoder uses the language model Gemma as a text encoder to enhance the performance of prompt understanding and inference. Unlike the conventional T5 , Gemma has excellent text understanding, so it can improve image and text alignment while dealing with training instability. In addition, a mechanism called 'Flow-DPM-Solver' is introduced to reduce sampling steps, which reduces the number of sampling steps from 28-50 to 14-20 compared to 'Flow-Euler-Solver', achieving efficient caption labeling and selection.
As a result of these efforts, Sana is as competitive as the latest high-performance image generation AIs such as
Below is a table comparing the performance of Sana with various image generation AIs. It has been reported that each model of Sana has higher performance than other image generation AIs in terms of
At the time of writing, Sana's source code is scheduled to be released soon.
Related Posts:
in Software, Posted by log1r_ut