Art and engineering experts explain why AI is not good at drawing 'hands'



In addition to '

Stable Diffusion ' by California startup Stability AI, ' Midjourney ' that uses Discord commands, ' NovelAI ' that specializes in illustrations such as anime, and the copyright-free ' Adobe Various image generation AIs such as Firefly are prevalent. The image generation AI can generate a fairly realistic person and high-quality character illustrations just by entering a prompt, but there are expressions and parts that AI is not good at, and in particular, there are cases where it fails to draw 'human hands'. are increasing. Online media Vox explains the mechanism why AI is not good at drawing hands correctly.

Why AI art struggles with hands-YouTube


People and characters by image generation AI have become quite high quality, and sites such as ` ` Which Face is Real? A quiz 'Human or AI' that guesses whether the author is human or AI is also released. Research results show that human images can be distinguished by the shape of their pupils , and some point out that AI illustrations are not good at depicting contextual relationships and deformations. It is often pointed out that I am not good at drawing 'hands'.

A fierce quiz that guesses whether the author of the illustration is human or AI has appeared, and what is the distinguishing point by the image generation AI skilled editorial staff? -GIGAZINE



According to Vox, the fact that the image generation AI fails to represent the hand tells us how AI art works.



Stan Prokopenko, an artist and art teacher, points out that 'pattern recognition' is important as training to become an artist. Not only have we observed many hand shapes and movements, but we have lived with the recognition of our own hands and those of others, which allows us to understand what a hand is like.



AI is similar in terms of learning patterns, but AI training is in a state of being trapped in a museum, looking only at photographs or drawings and accompanying placards. .



For example, if you want to observe an apple in detail, it is desirable to hold it in your hand and rotate it while looking at it carefully.



However, the AI only sees a picture of an apple and the description 'Apple on a brown table'.



Also, the way humans and AI learn things they observe is very different. Human artists generally try to understand some rules when they start training, and tend to simplify to basic shapes when drawing complex things like hands.



Think of the palm and the back of the hand as a thick rectangle, and place the shape of the hand and the positions of the fingers there. You can make a picture.



AI, on the other hand, makes the basic shapes pretty weird, like in the image below. However, when you zoom in, the light and the texture of the skin are drawn quite finely.



AI can easily reproduce patterns on a pixel-by-pixel basis, so it can reproduce elements such as color and texture to a great extent. I haven't been able to learn things like 'Don't bend.'



In short, AI continues to observe the hand in the canvas, so even if it can understand the pixel-by-pixel arrangement of the hand, it cannot understand how the hand moves.



We can conclude that ``AI cannot draw hands because it is not human.'' increase.



To better understand learning models for image-generating AI, Vox collaborated with Irun Du, a graduate student in robotics at the Massachusetts Institute of Technology (MIT), and an MIT principal investigator who has been working on generative arts since 2018. I'm talking to Roy Shilkrott, who teaches. As a result, Vox says that it has found ``three big reasons'' that it is difficult for image generation AI to draw hands.



Vox cites three problems with AI: 'data size and quality', 'human hand movement', and 'low error tolerance'. 'Data size and quality' simply means that there are not as many pictures and drawings that can be learned from hands as compared to human faces.



There are also sites that publish hand datasets for sketching reference, etc., but these are not made for training image-generating AIs, so there is no question of 'what kind of hand pictures or drawings?' In many cases, there are no annotations such as 'what kind of movement is the hand in shape'.



According to Silkrott, when learning 'a person with an umbrella', it seems that there are few cues given to the machine other than 'a person has an umbrella'. However, in reality, there are small movements such as ``the thumb is sticking out of one of the handles of the umbrella,'' ``the fingers holding the handle are bent,'' and ``the thumb covers the index finger.''



As a result of being able to understand only rough information about how to hold an umbrella, which everyone naturally understands, AI generates an image that looks like the umbrella and hand are combined.



The second ``movement of the human hand'' is attributed to the fact that the hand moves much more complicatedly than the face. In the case of a face in a portrait photograph, there is roughly what is called a 'general condition', and there are rules to some extent about where the eyes should be and what distance each part should be placed. . However, Mr. Do points out that there are no simple rules such as the front and back of the hand and the movement of each finger.



Mr. Prokopenko points out a similar complexity, and depending on the movement and orientation of the hand, ``how many fingers are visible'' changes, but AI says ``there are five fingers on the hand'' Because they do not understand , they will learn the number as they see it. The same thing happens in cases like 'running horse's legs', where AI misunderstands the number of horse legs by seeing more than 5 legs due to quick movement, or less than 3 legs overlapping. It will end up.



Mr. Du expresses that ``AI does not have as much prejudice as we do'' that AI recognizes it as it sees it. This is also related to the third reason why AI is not good at hands, ``low error tolerance''. The image below is a ``man with an apple'' created with Midjourney, and each of the four images has a different appearance of the man's mouth, the clothes he wears, and the apple. At this time, even if the impression of the man's face, clothes, and the texture of the apple do not completely match what is expected, it is acceptable without causing any discomfort, but if the appearance of the hand is slightly different. , we perceive it as 'absolutely impossible,' Vox points out.



In order to solve such weak points of AI, there are two main types of training. Mr. Silkrod said that by having AI learn a larger number of photos, it is possible to solve the problem to some extent, but that will require a large amount of image processing and enormous resources for retraining the model. I'm here. In addition, Mr. Du said that by continuing to rank images generated by AI by a large number of users, as in ChatGPT's feedback that ``users evaluate the good and bad of answers by AI,'' learning data We are talking about being able to label them.

in Software,   Web Service,   Video,   Art, Posted by log1e_dh