A record of trial and error until an image is generated as imaged with AI ``DALL E 2'' that generates an image from a character string



The AI ` ` DALL E 2 '' that generates images from input text developed by OpenAI was released to researchers and experts in April 2022 and released to the general public in July 2022. Joy Chan, co-founder of AI programming contest

Coder One , publishes trial and error on his blog until DALL E2 generates the image he wants to output.

I spent $15 in DALL E 2 credits creating this AI image, and here's what I learned | by Joy Zhang | Aug, 2022 | Towards AI
https://pub.towardsai.net/i-spent-15-in-dall-e-2-credits-creating-this-ai-image-and-heres-what-i-learned-52f352912025

You can understand what kind of AI DALL E 2 is by reading the following article.

High-resolution, low-latency version of 'DALL E' that creates images from input text information 'DALL E 2' appears - GIGAZINE



DALL E 2 has been released to the general public, but at the time of writing the article, it was released as a closed beta version, and those who registered in the waiting list in early May 2022 will be available sequentially. . 3 to 4 images are generated with one use, but 1 credit is required for each use. This credit is 50 credits in the first month, 15 credits are distributed every month after that, and it seems possible to purchase 115 credits for $ 15 (about 2050 yen). DALL E mini, which can be used by anyone for free, is also available, but it takes time to generate and the image quality is low.

Mr. Chan entered the character string 'llama playing basketball' to know the specifications of DALL E2. The output result is as follows, and Mr. Chan said, 'Why is a cartoon image generated for this input? This is related to the lack of live-action images of llamas playing basketball during training. I think there is.'



Mr. Chan then added the keyword 'realistic photo of' and generated it again. As a result, although the llama no longer feels like an 'illustration itself', it feels like an image cut out in Photoshop.



Of course, it is important to repeat the input and output many times, but credits are required to use DALL E 2, so it cannot be generated indefinitely. Therefore, Mr. Chan referred to the following page that summarizes the input string of DALL E 2 and the result.

The DALL E 2 Prompt Book – DALL Ery GALL Ery

https://dallery.gallery/the-dalle-2-prompt-book/

Mr. Chan says that a particularly effective string is 'dramatic backlighting' while referring to this page. The following image is 'Film still of a llama dunking a basketball, low angle, extreme long shot, indoors, dramatic backlighting' generated by. The background and backlight are added to the llama image, which looks like a Photoshop cutout, making it look much closer to the real thing.



Mr. Chan also said, 'It is important to accurately convey your wishes to DALL E 2.' For example, if you want your llama to be dressed as a basketball player, you must enter it as such. 'film still of an alpaca wearing a jersey, dunking a basketball, low angle, long shot, indoors, dramatic backlighting, high detail The following is the image output with 'Normal backlight, high precision)'.



And if you specify the movement of the llama in detail, it looks like this. The text entered in this image is 'film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting (still image of a llama wearing a jersey and dunking a basketball like Michael Jordan, low angle, bottom view, oblique composition, 35 degrees,

dutch angle , extreme long shot, precision, indoor, dramatic backlight)”.



After repeatedly outputting so far, Mr. Chan said, ``DALL E 2 is not good at creating compositions. What should be done about the relative positional relationship between the llama, the ball, and the goal in the context of 'dunking basketball'? I think it's obvious (to humans), but in the image actually output by DALL E 2, the llama's dunk direction was wrong and the ball was misaligned.' . The image below is the same as the image above 'film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting (still image of a llama wearing a jersey and dunking a basketball like Michael Jordan, low angle, bottom view, oblique composition, 35 degrees, dutch angle, extreme long shot, high precision, indoor, dramatic Backlight)”, but the positions of the llama, ball, and goal are certainly messed up.



``Extreme long shot'' and ``low angle'' were the keywords for capturing the full body of the llama, but these specifications were completely ignored, and a large number of images were generated that only show the llama from the neck up. It seems that it is gone.



Furthermore, AI may generate strange images because it does not understand 'basketball' correctly. Looking at the goal net in the upper left of the image below, it is made of fur for some reason.



It is said that the DALL E 2 is intentionally designed not to generate realistic faces in order to avoid generating deep fakes, and the output looks like the face is distorted or crushed. increase. This feature seems to be effective for llamas, and Mr. Chan said that a rather spooky llama was generated.



Furthermore, DALL E 2 recognizes character strings as an algorithm, but does not understand the language itself. Therefore, it seems that there were cases where meaningless character strings were written on jerseys worn by llamas, for example.



On the other hand, it seems that DALL E 2 is superior in 'dividing image generation according to style'. For example, the image output with the keyword 'Abstract painting' is below.



'

Vaporwave '



'Screenshots from the Miyazaki anime movie'



That's why I output it many times using 100 credits worth about 13 dollars (about 1800 yen), and the image that Mr. Chan finally arrived at is below. The entered string is 'Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.( Still shot of llama in jersey dunking a basketball like Michael Jordan, low angle, bottom view, oblique composition, 35 degrees, dutch angle, extreme long shot, precision, indoors, dramatic backlighting) 'is. ``It's not a perfect image, but about 80% of what I wanted was met,'' Chan said.

in Software,   Art, Posted by log1i_yk