It turns out that hundreds of pedestrians were not labeled in the popular data set for automatic driving



Algorithms that control driving are important in automatic driving, in which a car runs automatically without human driving. In order to perform highly safe automated driving, it is necessary to train the algorithm by machine learning using a huge data set, but with the image of the data set commonly used in open source automatic driving algorithm It has been reported that ' pedestrians are incorrectly labeled '.

Self-driving car dataset missing labels for hundreds of pedestrians

https://blog.roboflow.ai/self-driving-car-dataset-missing-pedestrians/

In order to drive safely, self-driving vehicles use cameras and sensors to monitor their surroundings one by one, and algorithms control the vehicle based on that information. The accuracy of the algorithm for autonomous driving has been dramatically improved by machine learning, but machine learning includes a large number of images taken with an in-vehicle camera for autonomous driving and `` What is in the image? You need a huge data set that contains information (labels)

One of the hardest points for machine learning developers is building this dataset. Collecting a large amount of images and labeling them one by one requires a lot of effort, so for students developing autonomous driving systems on a limited budget, such as university research projects, this data set Building is a big wall.

Therefore, datasets published on GitHub etc. are often used as open source, and among them, Udacity Dataset 2, which contains about 15,000 images, is a particularly popular dataset. However, Brad Dwyer, the founder of the machine learning development company `` roboflow '' manually checked all images, and found that about 3386% of the 4986 images, `` not correctly labeled 'There was a problem.

The following images are examples of problematic images. The part of the image that is surrounded by a light blue frame is labeled. On the other hand, the part surrounded by the red frame was not labeled. For example, in the image on the upper left, the bicycle is popping out in front of the car, but the bicycle is not labeled, and learning with this data can recognize that the bicycle has popped out in front of you Accuracy drops. Other vehicles, such as cars crossing intersections, pedestrians crossing pedestrian crossings, and pedestrians on sidewalks, are not labeled.



Also, of the problematic images, 217 were not labeled at all. However, they found that these images included bicycles, pedestrians, and parking on the street.



'Open source datasets are great, but we need to do better work to ensure that the data we share is complete and accurate. If you do, do a due diligence to ensure the integrity of your dataset before using it in a live environment. '

Mr Dwyer said he would fix the problematic Udacity Dataset 2 and release his own full version.

in Software,   Ride, Posted by log1i_yk