What are the thinking features of the image recognition AI 'CLIP' developed by OpenAI?
OpenAI, a non-profit organization that develops AI 'GPT-3 ' that generates high-precision sentences and AI 'DALL ・ E ' that generates images from sentences, has newly developed an image recognition AI '' Explains the characteristics of thinking of ' CLIP'.
Multimodal Neurons in Artificial Neural Networks
Different types of humans, such as 'neurons that respond to actress Halle Berry 's face photo, Halle Berry's illustration, and the string' Halle Berry '.' It is known that there are neurons that respond uniformly to the information in. According to OpenAI, CLIP can treat different forms of information as the same as humans.
Traditional image recognition models that recognize a human face as a 'human face' do not respond to human face illustrations or text that says 'human face.' However, CLIP can treat Spider-Man's cosplay images and illustrations, and the string 'SPIDER' as the same thing.
In addition, CLIP recognizes images by multiplying the features of different images. For example, when recognizing a 'piggy bank', CLIP recognizes it by multiplying different elements such as 'finance' and 'dolls, toys'.
In addition, CLIP also subtracts elements. For example, the expression 'surprised' is recognized by combining facial expressions such as 'blessing, hug', 'shock', and 'smile, grin'. The expression 'intimate' is recognized by subtracting the element of 'ilness' from the combination of facial expressions such as 'soft smile' and 'heart'.
can accurately recognize the following Standard Poodle image as 'Standard Poodle', but if you put multiple '$' marks on the image, it will be recognized as a piggy bank.
OpenAI also explains the weaknesses of CLIP's thinking. For example, CLIP
OpenAI has released a tool used to analyze CLIP's thinking. He states that he will continue to research CLIP and work on solving problems.