Does the AI image generator 'DALL E 2' have its own 'secret language'?



The AI image generator '

DALL E 2 ' that combines natural language processing and image generation may generate similar images when you enter a character string that is meaningless to humans, so 'secret language' There is, 'the researchers pointed out.

However, another researcher has argued that this point is 'just a coincidence.'

Discovering the Hidden Vocabulary of DALLE-2
(PDF file) https://giannisdaras.github.io/publications/Discovering_the_Secret_Language_of_Dalle.pdf



Giannis Daras and Alexandros G. Dimakis of the University of Texas at Austin pointed out that DALL E 2 has a secret language. For example, 'Apoploe vesrreaitais' means a bird, and 'Contarra ccetnxniams luryca tanniounons' means an insect or a pet. Therefore, if you enter a sentence such as 'Apoploe vesrreaitais eats Contarra ccetnxniams luryca tanniounons', you will see an image of a bird eating an insect.



DALL ・ E 2 was originally not good at handling texts, so when I entered 'two farmers talking about vegetables, with subtitles', the two farmers spoke incomprehensible words. The image you are viewing is displayed. According to Daras, the text in the background doesn't seem to be random ...



When the image was generated with the word 'Vicootes' that appeared in the background of the image earlier, an image of vegetables was obtained. Similarly, when the image was generated with the word 'Apoploe vesrreaitais' that appeared in the image, an image of a bird was obtained. In other words, the conversation that seemed incomprehensible earlier is thought to have been generated by DALL E 2 as 'a farmer talking about a bird that eats vegetables.'



Entering the sentence 'Two whales talking about food, with subtitles' will generate an image of the whales talking 'Wach zod rea'. Since the result of the image generation in 'Wa ch zod rea' was actually an image of food, Daras said, 'The image of a whale talking about food (in the language of DALL E 2) was created. I interpret it as.



According to Daras, some words in the DALL E 2 language produce absurd sentences. For example, if you enter 'picture of Apoploe vesrreaitais', you can get a picture of a bird, but depending on the model, 'Apoploe vesrreaitais' seems to mean something like 'flying in the sky'.



The discovery of the DALL-E 2 language has created many interesting challenges in terms of security and interpretability, says Daras.



Meanwhile, research researcher Benjamin Hilton said, 'There is no secret language in DALL-E, or at least we haven't found it yet.'



In Daras's paper, 'Contarra ccetnxniams luryca tanniounons' means insects and pests, but according to Hilton's research, many images of animals also appear.



If DALL E has a secret language, all conversions with DALL E should be based on the same rules, but Hilton wrote in 'Contarra ccetnxniams luryca tanniounons' '3D rendering'. By adding the condition, only images of sea creatures, not insects, are now generated.



Similarly, if you add the conditions of 'cartoon' and 'picture', it will be just a grandma image.



Next, Hilton confirmed 'Apoploe ves rreaitais,' which Daras claims to 'mean a bird.'



Similarly, when the conditions of 'cartoon' and 'picture' were added, many insects were displayed and no birds appeared at all. From this, Hilton speculated that 'the result is just a coincidence' or that the word has two meanings.



Then, when I examined the combination of 'Apoploe vesrreaitais eats Contarra ccetnxniams luryca tanniounons', birds appeared, but insects did not.



Next, Hilton verified 'Vicootes,' which was claimed to mean 'vegetables.' However, when conditions were added, the results were different for each condition.



In response to these results, Hilton commented, 'It seems more like stochastic and random noise than having a secret DALL E language.'



In addition, Hilton also conducted a survey on the character strings written in the images generated by DALL ・ E.



This is the first image in which a character string that can be copied somehow was generated using the input content used by Mr. Daras that 'two whales are talking about food'. A character string that can be read as 'Evve waeles' is output.



Entering this 'Evve waeles' into DALL E 2 gave images of desserts, animals, sports, etc.



It is probable that 'Evve waeles' has no particular meaning, or that 'whales' have been translated. From this, 'Daras was fortunate enough to say that the whale was'Wach zod rea'and actually got an image of the food in that word,' said Hilton.



From the above, Hilton thinks fair and commented that if Daras always produces a picture of a bird when he enters 'Apoploe vesrreaitais', then there is still something.



However, that does not prove that 'DALL / E has a secret language' or 'the character string output by DALL / E means something', and if they are different, I look forward to being proved. I conclude that I am doing it.



in Science, Posted by logc_nt