Developed a new AI technology "AttnGAN" that automatically creates imaginary images at a level that Microsoft is mistaking as authentic from text



We developed AI technology that allows Microsoft to automatically generate images from sentences. By entering "Yellow body, having black wings and short beak birds", you can automatically generate a natural image as if it were a real bird.

[1711.10485] AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversal Networks
https://arxiv.org/abs/1711.10485

Microsoft researchers build an AI that draws what you tell it to
https://blogs.microsoft.com/ai/drawing-ai/

AI that automatically generates descriptive text (caption) from photos and imagesGoogleAnd has been developed by. Microsoft has developed similar technologies, and there are technologies already introduced in Office.

"CaptionBot" that you can experience Microsoft's image recognition function automatically analyzing images and explaining by words - GIGAZINE


AI added a new function to Microsoft's Office tool to automatically recognize images and create explanations - GIGAZINE


For the AI ​​technology that has been developed in the direction of "image → caption", the technology that the Microsoft laboratory announced newly is utilized in the direction of "caption → image". In other words, if you enter the description of the image you want in text, AI will automatically generate a corresponding image. Researchers who developed this AI technology simplydrawing botIt is said to call it.

In the following image generation process, a text "Yellow body, having black wings and short beak birds" is input, and a drawing bot generates an image suitable for the image. Surprisingly the image of the finally completed bird (lower right) is not what I chose from real bird pictures, but it was created from scratch on a pixel-by-pixel basis "computer-imagined birds" It is said that. In other words, in the future, if it can happen that "the image image that appeared by combining phrases in Bing may not be a real bird", it is said that the principal researcher of the Microsoft Research Center Deep Learning Technology CenterXiaodong He"He says.


While conventional AI technology of "image → caption" should carefully select certain information from among several features, drawing bot, which is the AI ​​technology of "caption → image", has several Based on the information, the process of compensating for missing information by myself and reconstructing it goes through the process, so the technical difficulty is dramatically improved. The core of the technology for realizing this drawing bot is "Generative Adversarial NetworkIt is a technology called "GAN". The image generated by GAN is refined by a model that judges the quality called "discriminator".

In the process of "Caption - Image", the task of generating images from simple text inputs such as "blue birds" or "evergreen trees" is not that difficult, but it is not so complicated as "yellow feathers", "red abdomen" When conditions were added, there was a problem that detailed information of the explanation was lost in order to capture the entire sentence as a single piece of information. Therefore, in drawing bot, when a person draws a picture, referring to repetitive explanatory text, mathematically expressing the concept of "attention" with reference to the action of paying the latest attention to the words expressing the image "AttnGANWe created a parameter called " In this way, we divide the input text into individual words and build detailed contents of the image.

In addition, AttnGAN lets AI learn the concept of "common sense" of human being by machine learning. Although training of machine learning is carried out on pairs of image and caption here, it is said that learning about "common sense" to which "bird" belongs because many of the images of birds are "caught in branches of trees" .


As described above, in drawing bot, AttnGAN, which combines two machine learning processes, "attention" and "common sense", generates an image in accordance with the explanation, and a model "discriminator" judges the quality of the generated image It made it possible to generate quality images that look like mistakes as real photos. The quality of the image generated by AttnGAN has reached 3 times the image quality made with the conventional GAN ​​technology.

Although it is a drawing bot that has reached a phenomenal level with almost no sense of incompatibility as "artifacts AI produces", according to Microsoft, several small defects have been technically seen, and it is still unfinished technology about. By making improvements, it will be used in the future as a supplement to sketches, refining photos created by speech recognition, making animated movies from text-based scripts without any manual changes It is aimed at.

in Software,   Design, Posted by darkhorse_log