Does the AI image generator 'DALL E 2' have its own 'secret language'?
The AI image generator '
However, another researcher has argued that this point is 'just a coincidence.'
Discovering the Hidden Vocabulary of DALLE-2
(PDF file) https://giannisdaras.github.io/publications/Discovering_the_Secret_Language_of_Dalle.pdf
Giannis Daras and Alexandros G. Dimakis of the University of Texas at Austin pointed out that DALL E 2 has a secret language. For example, 'Apoploe vesrreaitais' means a bird, and 'Contarra ccetnxniams luryca tanniounons' means an insect or a pet. Therefore, if you enter a sentence such as 'Apoploe vesrreaitais eats Contarra ccetnxniams luryca tanniounons', you will see an image of a bird eating an insect.
DALLE-2 has a secret language.
— Giannis Daras (@giannis_daras) May 31, 2022
'Apoploe vesrreaitais' means birds.
'Contarra ccetnxniams luryca tanniounons' means bugs or pests.
The prompt: 'Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons' gives images of birds eating bugs.
A thread (1 / n) ???? pic.twitter.com/VzWfsCFnZo
DALL ・ E 2 was originally not good at handling texts, so when I entered 'two farmers talking about vegetables, with subtitles', the two farmers spoke incomprehensible words. The image you are viewing is displayed. According to Daras, the text in the background doesn't seem to be random ...
A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: 'Two farmers talking about vegetables, with subtitles' gives an image that appears to have gibberish text on it.
— Giannis Daras (@giannis_daras) May 31, 2022
However, the text is not as random as it initially appears ... (2 / n) pic.twitter.com/B3e5qVsTKu
When the image was generated with the word 'Vicootes' that appeared in the background of the image earlier, an image of vegetables was obtained. Similarly, when the image was generated with the word 'Apoploe vesrreaitais' that appeared in the image, an image of a bird was obtained. In other words, the conversation that seemed incomprehensible earlier is thought to have been generated by DALL E 2 as 'a farmer talking about a bird that eats vegetables.'
We feed the text 'Vicootes' from the previous image to DALLE-2. Surprisingly, we get (dishes with) vegetables! We then feed the words: 'Apoploe vesrreaitars' and we get birds. It seems that the farmers are talking about birds , messing with their vegetables! (3 / n) pic.twitter.com/OiU7NPTbor
— Giannis Daras (@giannis_daras) May 31, 2022
Entering the sentence 'Two whales talking about food, with subtitles' will generate an image of the whales talking 'Wach zod rea'. Since the result of the image generation in 'Wa ch zod rea' was actually an image of food, Daras said, 'The image of a whale talking about food (in the language of DALL E 2) was created. I interpret it as.
Another example: 'Two whales talking about food, with subtitles'. We get an image with the text 'Wach zod rea' written on it. Apparently, the whales are actually talking about their food in the DALLE-2 language. (4 /n) pic.twitter.com/cqlUYXlLvf
— Giannis Daras (@giannis_daras) May 31, 2022
According to Daras, some words in the DALL E 2 language produce absurd sentences. For example, if you enter 'picture of Apoploe vesrreaitais', you can get a picture of a bird, but depending on the model, 'Apoploe vesrreaitais' seems to mean something like 'flying in the sky'.
Some words from the DALLE-2 language can be learned and used to create absurd prompts. For example, 'painting of Apoploe vesrreaitais' gives a painting of a bird. 'Apoploe vesrreaitais' means to the model 'something that flies' and can be used across diverse styles. (5 / n) pic.twitter.com/w73iKN4kM1
— Giannis Daras (@giannis_daras) May 31, 2022
The discovery of the DALL-E 2 language has created many interesting challenges in terms of security and interpretability, says Daras.
The discovery of the DALLE-2 language creates many interesting security and interpretability challenges.
— Giannis Daras (@giannis_daras) May 31, 2022
Currently, NLP systems filter text prompts that violate the policy rules. Gibberish prompts may be used to bypass these filters. (6 / n)
Meanwhile, research researcher Benjamin Hilton said, 'There is no secret language in DALL-E, or at least we haven't found it yet.'
No, DALL-E doesn't have a secret language.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
(or at least, we haven't found one yet)
This viral DALL-E thread has some pretty astounding claims. But maybe the reason they're so astounding is that, for the most part, they're not true.
Thread ???????? (1/15) https://t.co/8F2WDp7lTK
In Daras's paper, 'Contarra ccetnxniams luryca tanniounons' means insects and pests, but according to Hilton's research, many images of animals also appear.
Let's start with some of the basic claims.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
1) @giannis_daras says 'Contarra ccetnxniams luryca tanniounons' means bugs or pests.
This just seems wrong.
Here's what I get if I put 'Contarra ccetnxniams luryca tanniounons' into DALL-E --lots of different animals.
(2/15) pic.twitter.com/RGHeRw1pmb
If DALL E has a secret language, all conversions with DALL E should be based on the same rules, but Hilton wrote in 'Contarra ccetnxniams luryca tanniounons' '3D rendering'. By adding the condition, only images of sea creatures, not insects, are now generated.
The key to claims of a DALL-E 'secret language' is that these terms apply across DALL-E prompts --including when used in more complex prompts, like asking DALL-E to output in other styles.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
But if I add “, 3d render” to the prompt I get sea-related things, not bugs.
(3/15) pic.twitter.com/YUspbCyqgS
Similarly, if you add the conditions of 'cartoon' and 'picture', it will be just a grandma image.
The prompts 'Contarra ccetnxniams luryca tanniounons, cartoon' and '' Contarra ccetnxniams luryca tanniounons, painting 'give me ... grandmas ?!
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
(4/15) pic.twitter.com/eBQY4bWSzL
Next, Hilton confirmed 'Apoploe ves rreaitais,' which Daras claims to 'mean a bird.'
2) How about the claim that “Apoploe vesrreaitais” means “birds” or “things that fly”?
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
This does better. For the prompt “Apoploe vesrreaitais”, DALL-E does generate birds.
(5/15) pic.twitter.com/4LHUYGqWyZ
Similarly, when the conditions of 'cartoon' and 'picture' were added, many insects were displayed and no birds appeared at all. From this, Hilton speculated that 'the result is just a coincidence' or that the word has two meanings.
If I try a cartoon, or a 3D render, DALL-E generates lots of bugs (some of which can fly) and no birds.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
My best guess? It's random chance. Or just maybe (if you really press me) “Apoploe vesrreaitais” looks like a binomial name for some birds or bugs.
(6/15) pic.twitter.com/hC3g2B9HRS
Then, when I examined the combination of 'Apoploe vesrreaitais eats Contarra ccetnxniams luryca tanniounons', birds appeared, but insects did not.
3) Combining claims 1 and 2: does 'Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons' give images of birds eating bugs?
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
As you might expect from our previous results, this prompt definitely generates some birds, but I'm not sure there are any bugs.
(7/15) pic.twitter.com/lym1KZVLKe
Next, Hilton verified 'Vicootes,' which was claimed to mean 'vegetables.' However, when conditions were added, the results were different for each condition.
4) @giannis_daras says 'Vicootes' means “vegetables”.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
Again, yes, 'Vicootes' does give us some vegetable dishes. But:
--“Vicootes, cartoon” gives some weird characters
--“Vicootes, 3d render” gives objects
―― “Vicootes, painting” gives flowers and landscapes
(8/15) pic.twitter.com/oq0KBI4zjh
In response to these results, Hilton commented, 'It seems more like stochastic and random noise than having a secret DALL E language.'
To me this is all starting to look a lot more like stochastic, random noise, than a secret DALL-E language.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
(9/15)
In addition, Hilton also conducted a survey on the character strings written in the images generated by DALL ・ E.
Ok, let's dig a bit deeper.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
5) Does the text in DALL-E images mean something? @Giannis_daras uses the example of “Two whales talking about food, with subtitles”
He then claims that using text from one of these pictures as a prompt will generate images of food.
(10/15) pic.twitter.com/5aCP2eqvRp
This is the first image in which a character string that can be copied somehow was generated using the input content used by Mr. Daras that 'two whales are talking about food'. A character string that can be read as 'Evve waeles' is output.
None of these pictures really have transcribable text, so I asked DALL-E to generate more whales until there was an image with text to copy. This is the first one DALL-E gave me.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
(11/15) pic.twitter.com/6o5tb91JPx
Entering this 'Evve waeles' into DALL E 2 gave images of desserts, animals, sports, etc.
And look, prompting DALL-E with 'Evve waeles' gave me a picture of a delicious dessert!
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
But also --some footballers, some animals and a kettle?
(12/15) pic.twitter.com/jncHq0W13Q
It is probable that 'Evve waeles' has no particular meaning, or that 'whales' have been translated. From this, 'Daras was fortunate enough to say that the whale was'Wach zod rea'and actually got an image of the food in that word,' said Hilton.
What do I think? 'Evve waeles' is either nonsense, or a corruption of the word 'whales'. Giannis got lucky when his whales said 'Wach zod rea' and that happened to generate pictures of food.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
(13/15)
From the above, Hilton thinks fair and commented that if Daras always produces a picture of a bird when he enters 'Apoploe vesrreaitais', then there is still something.
CONCLUSION:
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
To be fair to @giannis_daras , it's definitely weird that “Apoploe vesrreaitais” gives you birds, every time, despite seeming nonsense.
So there's for sure something to this.
(14/15)
However, that does not prove that 'DALL / E has a secret language' or 'the character string output by DALL / E means something', and if they are different, I look forward to being proved. I conclude that I am doing it.
But I don't think there's evidence there's a secret language across prompts-or that the text in DALL-E images means anything.
— Benjamin Hilton (@benjamin_hilton) May 31, 2022
And if there is evidence, I'm looking forward to being proven wrong! @Giannis_daras --next round's on you :)
(15/15)
Related Posts:
in Science, Posted by logc_nt