Image generation AI 'Stable Diffusion' developer announces 'DeepFloyd IF' that can generate images from natural sentences
Stability AI, which developed the AI ``
DeepFloyd IF — DeepFloyd
https://deepfloyd.ai/deepfloyd-if
Stability AI announces 'DeepFloyd IF', a high-performance text-to-image conversion model that incorporates a large-scale language model
https://ja.stability.ai/blog/deepfloyd-if-text-to-image-model
Since the DeepFloyd IF demo page was published, I actually tried it. First, enter the prompt and click 'Generate'. This time, I entered 'a koala wearing clothes with the word 'good night' written on its abdomen' in Japanese as the prompt, and left the Negative Prompt blank.
Then, an image that seems to be unrelated was generated. If you enter any prompt in Japanese, an image like this will appear, so it is better not to enter in Japanese at the time of article creation.
The result of reconsidering and entering the prompt in English is like this. Four image candidates are displayed, all of which are output at low resolution and must be upscaled next.
Select one image you like and click 'Upscale'.
Then, the upscaled image is displayed like this.
The image below clearly shows the generation flowchart of DeepFloyd IF. Inputted prompts are transformed into qualitative text representations through the frozen T5-XXL language model, and further transformed into 64×64 images by three base models: IF-I 400M, IF-I 900M, and IF-I 4.3B. increase.
In the second stage, we apply the '
DeepFloyd IF was trained on the LAION-A dataset. LAION-A was derived from the LAION-5B dataset through similarity hash-based deduplication, cleaning, and other modifications to the original dataset, using DeepFloyd's custom filters to remove watermarks, NSFW, and other Inappropriate content has been removed.
DeepFloyd AI is good at ``reflecting characters'', which other models are not good at, and can correctly reflect characters in the image. You can check the lyrics of the song generated by DeepFloyd AI, reflected in the image, and animated from the following. In multiple scenes, you can see that the text exactly as the lyrics are reflected in the image.
Lyric video, but it's AI Generated (The Smiths - There Is a Light That Never Goes Out)-YouTube
Based on the same prompt, images were generated with Stable Diffusion 2.1 and DeepFloyd AI, and the images below are compared side by side.
This time a comparison image with
This is a comparison image with
Related Posts: