Published the biggest video data set 'YouTube-8 M' ever in history

A technique to automatically distinguish objects captured in still images and classify them according to the results of machine learning and machine perception research has developed and existence of databases such as ImageNet which analyzes hundreds of images, We are accelerating research on image understanding. On the other hand, it is difficult to analyze moving pictures because there are more information contained than the still pictures, so the data set for moving pictures is lacking compared to still pictures. In order to improve this situation, our research team has created 4,800Knowledge graphData set of 8 million YouTube videos tagged with entities "YouTube-8M"Has been released.

YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research

The largest video data set so far has collected 1 million YouTube videos tagged with 500 kinds of sports "Sports-1MAlthough it was a data set "YouTube - 8M" that can obtain further diversity on the existing 8 times larger scale, research on animation analysis is expected to progress further.

In making YouTube-8M, Google's research team says "1: Manually annotate work takes longer time for moving images than images" "2: Video processing and storage is very expensive in terms of computer" I worked on overcoming the two themes.

In order to overcome the first theme, the team focused on "Annotation System" which automatically generates annotation by identifying videos that are publicly available on YouTube with appropriate Knowledge Graph topics. The quality of the automatically generated annotation meets the level that can be used for motion picture analysis research. In order to guarantee the stability and quality of tagged video dataset, YouTube - 8M uses only publicly available videos more than 1000 views. You can filter videos with related tags by putting keywords from the video category as shown below and proves the diversity and scale of the data set.

With regard to the second theme of "Researchers lack the storage and resources to study videos", by using YouTube-8M optimized for research, they do not have expensive machines It is said that even students can do research. Our research team says that YouTube - 8 M is hoping for promoting new research and creating an approach to activate incomplete tags.

