What is the 'one banana problem' that highlights the problem of generative AI?

With the advent of generative AI such as Stable Diffusion and ChatGPT, anyone can easily create sentences and images. Daniel Hook, CEO of IT news site Digital Science, refers to the problem of generative AI from the `` one banana problem '' that he encountered when using image generation AI.

The Lone Banana Problem. Or, the new programming: "speaking" AI


Mr. Hook, who likes bananas, said that he had been jokingly talking with his friends, ``We should use more bananas for banana branding.'' And when the automatic generation AI Midjourney appeared, he thought, 'Isn't it a great opportunity to generate an ideal banana image?'

Mr. Hook generated an image at Midjourney with the prompt 'A single banana casting a shadow on a gray background'. The result is the following four images.

Looking at the image, it certainly reflects the fact that it casts a shadow on a gray background, and the banana is drawn with precision that can be mistaken for a live action. However, bananas are drawn as a set of two, not one. Therefore, Mr. Hook said, ``a perfect ripe banana on a pure gray background casting a light shadow, hyperrealistic'', ``a single perfect ripe banana alone on a pure gray background casting a light shadow, hyperrealistic photographic It seems that it was generated by changing it, but after all the picture with only one banana drawn was not generated.

Mr. Hook, who really wanted to generate an image of a single banana, consulted a friend who is familiar with programming. Then, it was proposed that it would be possible to output 'a monkey with a banana' and specify that the monkey should be transparent at the prompt. So, the result of actually specifying and outputting as it is is as follows.

The monkey, which should normally be invisible, is perfectly generated, and for some reason it's just holding a banana shyly. Moreover, I was holding two bananas instead of one. After that, no matter how many times Mr. Hook tried, it seems that two or more bananas were always drawn.

The bias that 'there are two bananas in the picture' is an example of a small bias that AI has, Hook said. The dataset used to train image generation AIs such as Midjourney contains images of bananas, labeled 'banana'. However, even if it is labeled as 'banana', it is highly likely that it is not labeled as '○ book of bananas'. , the number of bananas cannot be learned.

“One of the problems with generative AI is that it is almost impossible to understand what is going on inside the AI,” Hook said. 'Like the human brain, we cannot fully understand the processes going on inside deep learning algorithms.'

Hook points out that AI technology is developing rapidly and the results are spectacular, but there are still some gaps between reality and reality. Human skills are augmented by common sense, context, and the physical reality that surrounds them, but AI lacks these capabilities and cannot perform beyond the datasets humans provide for learning.

Of course, this way of thinking raises the unfavorable question for humans: ``Isn't human intelligence just the result of pattern matching in close contact with the physical reality?'' ``There is a limit to human imagination. says there are limits. Similarly, Mr. Hook argues that AI's creativity has limits of imagination.

Mr. Hook said, ``The output results of generative AI such as ChatGPT and Midjourney give the impression that they understand reality, but because they do not have a sense of the physical world, they have the concept of a single banana. No. At the current level of development, AI does not perceive objects in the same way that we humans do.AI is born in the logical world, not the physical world.'

As a result of Mr. Hook continuing the challenge of outputting an image of one banana for two weeks, 'A single banana on its own casting a shadow on a gray background' With the prompt, he finally succeeded in outputting the following image. However, even if the successful prompt was entered again in Midjourney, it seems that two bananas were still output, or an image of one banana about to split into two was output.

