MIT and IBM research team released a data set `` ObjectNet '' that collected only strange images to overcome the `` blind spot '' of the image recognition model

Image recognition models using artificial intelligence (AI) are intended to accurately identify objects that appear in photos and videos, and are applied to various things such as the external recognition function of automatic driving cars. The For example, in the case of an automatic driving car, the object recognition accuracy of the image recognition model is directly related to the safety of the automatic driving car, so the data set used for learning the model plays a very important role. Therefore, the Massachusetts Institute of Technology (MIT) and IBM's team of researchers have created a dataset '

ObjectNet ' for image recognition models that include a wide variety of objects.

This object-recognition dataset stumped the world's best computer vision models | MIT News

“ObjectNet”, which is a data set for image recognition models, does not include the training set used to train image recognition models, and consists only of test sets to verify the accuracy of the model. The number of test sets of recorded images is the same as 50,000 images of the cloud source data set ImageNet that caused the AI boom.


ImageNet is a data set that contains images collected through photo sharing services such as Flickr, but ObjectNet is a data set that summarizes the photo data collected by freelance photographers and others. It has become. By intentionally tilting the object sideways, shooting from strange angles that are not normally taken, or shooting in a messy room intentionally, it collects images that make image recognition difficult .

In the case of ImageNet (left), only the following easy-to-understand photos are recorded. On the other hand, in ObjectNet (right), a chair is placed in a messy room, the back of the chair is photographed, or a photograph that is difficult for humans to judge.

The image recognition model uses a data set to improve image recognition accuracy by deep learning . However, even in a huge data set such as ImageNet, the images included in it have blind spots that there are no images such as `` back of chair '' or `` fallen chair '' as in the above example about. Therefore, an image recognition model learned with a conventional data set such as ImageNet cannot accurately recognize an image if it encounters an irregular case such as 'back of chair' or 'fallen chair'.

ObjectNet also does not include a training set, unlike other datasets. Most data sets have a training set for learning the model and a test set for accuracy verification, but the two have high similarity and there are cases where accurate accuracy verification can not be done .

Actually, when we performed a recognition test of images recorded on ImageNet and ObjectNet using the main image recognition models, we succeeded in correctly recognizing images recorded on ImageNet with a maximum accuracy of 97%. In the case of, the accuracy seems to have dropped to about 50-55%. This is a manifestation of the fact that the image recognition model cannot recognize the back side of the object accurately, and IBM researcher Dan Gut Freund, who was involved in the development of ObjectNet, said, `` The architecture of the latest image recognition model Indicates that it does not incorporate the concept of recognizing the back side of objects or unusual angles. '

Boris Katz , a research scientist working at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and CBMM , who did the research, said `` I need a better and smarter algorithm '' about the image recognition model It is. Regarding ObjectNet, the results will be announced at NeurlPS , a conference on neural information processing systems held from December 8 to 14, 2019.

NeurIPS | 2019

`` If you want to know how well the algorithm works in the real world, Andre Bulb, who works as a researcher at CSAIL and CBMM, said, `` If you want to know how well the algorithm works in the real world, image recognition with images that you have never seen before We need to test the model, 'explains that ObjectNet is a dataset created to validate rather than create an image recognition model.

Since the image data for ObjectNet was collected using Amazon Mechanical Turk , photographs taken not only in the United States but also in countries around the world are included. Therefore, various variations are included, such as some of the same banana photos, some of which are yellow and some of which are green.

in Software, Posted by logu_ii