It is pointed out that the reason why 'outputting characters' is difficult with image generation AI is similar to 'foreigner's mysterious kanji tattoo'



When using image generation AI such as Stable Diffusion or DALL-E 3, you tend to encounter problems such as ``mysterious patterns are output instead of letters'' and ``short words are spelled differently.'' There is a heated debate on the social news site Hacker News about why image generation AI is not good at 'outputting text'.

Ask HN: Why can't image generation models spell? | Hacker News

https://news.ycombinator.com/item?id=39727376

Below is an example of generating an image containing text using image generation AI. As a result of generating an image using ' Image Creator ' equipped with DALL-E 3 with the prompt 'Photo of the exterior of a ramen restaurant with the name 'Ramen Fantasy' written on it,' the phrase 'Ramen Fantasy' was not output, but ' The incorrectly spelled word ``RAIMEN'' and a mysterious kanji-like pattern were output.



It seems that Japanese characters are converted to English and processed, so change the prompt to 'Photo of the exterior of a ramen shop with the name 'Ramen Eater' written on it' to generate an image that includes English words. The generated results are below. 'Eater' has become 'EEATER'.



The problem of image generation AI not being able to properly output sentences seems to be troubling users all over the world, and the social news site Hacker News posted, ``I wanted to generate an image that included my son's name, but the image was misspelled. is generated. Even though it's only a 5-character name. Why does the image generation AI misspell it?' was posted and received many comments.

Gwane Branwen, an expert on AI, points out that AI is not good at character generation because ``many image-generating AI models are not able to learn text well enough,'' and ``when tokenizing prompts, they don't take character output into account.'' They cited reasons such as ``because we have not done so.''

In addition, barkingcat explained that ``the training data of image generation AI does not include enough text information'', ``If an English artist who does not know Japanese at all creates a tattoo that includes kanji, he may not know the shape of the kanji. He gives an example of this, saying , ``Even if you are Japanese, you don't know how to write kanji, so you can create funny tattoos.''


by Pablo Manriquez

In addition, developers of image generation AI models are also aware of the problem of ``not being able to output sentences well,'' and research and development is progressing to improve generation accuracy. For example, 'Stable Diffusion 3' announced in February 2024 is appealing for its ability to accurately output sentences.

High-quality image generation AI ``Stable Diffusion 3'' announced, capable of achieving high precision ``depiction of specified characters'' and ``depiction of multiple subjects'', which image generation AI is weak at - GIGAZINE



in Software, Posted by log1o_hf