IBM released a huge data set containing face data of 1 million people with the aim of 'fairness of face recognition technology'

IBM Basic Research Laboratories released a data set called "Diversity in Faces (DiF, face diversity)" on January 29, 2019. This data set was created with the aim of improving the fairness and accuracy of face recognition technology, and it is said that about 1 million people are included in human face images and comments of various sex and race .

Diversity in Faces
(PDF file)

IBM Research Releases 'Diversity in Faces' Dataset to Advance Study of Fairness in Facial Recognition Systems - IBM Blog Research

The development and practical use of "face recognition technology" that recognizes faces using cameras and algorithms to identify individuals is finally expanding, and face recognition systems such as Face ID are used to unlock smartphones, police There are increasing cases in which face recognition systems are introduced for investigation and security and actual results are achieved . However, the accuracy of the face recognition system seems to be far from perfection, and although the face recognition system was introduced in British police, we have resulted that the misdetection rate is over 90%.

It turned out that more than 2000 soccer fans were erroneously judged as suspects of crime by face recognition system using surveillance camera - GIGAZINE

There is an indication that "the bias of the face recognition system" is affecting the high misrecognition rate, and there are also survey results that African-Americans actually fall by 5 to 10% accuracy compared to whites I will. A research team at IBM Basic Research Laboratory says, "An important aspect of actually restricting the performance of face recognition systems is" intrinsic facial diversity ", and the performance of facial recognition systems should not differ by individuals or groups." I said.

The DiF dataset released by IBM Basic Research Laboratories said that about 1 million people face images of various races and sexes.

Furthermore, each image is labeled with "objective scale" of the face such as head shape, face symmetry, length of nose, height of forehead, annotation data such as age and sex is. According to the research team, the annotation data summarizes the sizes and features of more than 47 sites, which will improve the fairness and accuracy of the face recognition system and make the algorithm performance more powerful.

"The IBM Research Institute is committed to continuing research on a more fair face recognition system, but we do not believe that the face recognition system will advance, as a result of the release of DiF, other faces We also contribute to recognition system research and advance this important scientific agenda. "I hope this DiF will become a new first step in face recognition system research.

The DiF dataset is provided to the community that studies face recognition systems all over the world and in order to access it it is necessary to apply by e-mail in response to the questionnaire of IBM Basic Research Laboratory from the following.

Diversity in Faces Dataset - Trusted AI - IBM Research AI

in Software, Posted by log1i_yk