Child sexual abuse content was found in the dataset 'LAION-5B,' which is also used for Stable Diffusion, and the developer released 'Re-LAION-5B,' which removed the links.



Following the discovery of child sexual abuse content (CSAM) in the

LAION-5B dataset, which was used in well-known image generation AIs such as Stable Diffusion and Midjourney, LAION, the developer of LAION-5B, has announced a new version, Re-LAION-5B , in which CSAM has been removed from the dataset.

Releasing Re-LAION 5B: transparent iteration on LAION-5B with additional safety fixes | LAION
https://laion.ai/blog/relaion-5b/

Nonprofit scrubs illegal content from controversial AI training dataset | Ars Technica
https://arstechnica.com/tech-policy/2024/08/nonprofit-scrubs-illegal-content-from-controversial-ai-training-dataset/

In December 2023, the Stanford Internet Watch , which studies internet safety, pointed out that LAION-5B contained CSAM. The report found that of 5.8 billion image links collected from the internet, 1,008 links were judged to be 'CSAM' or 'suspected of CSAM,' and that the existence of such datasets is one of the reasons why some image generation AI can easily create deep fakes depicting children.

It was discovered that the 5 billion+ image set 'LAION-5B' used in the image generation AI 'Stable Diffusion' contained 1,008 child pornography images and will be deleted - GIGAZINE



Following the report, LAION immediately removed LAION-5B and worked with the Stanford Internet Watch and anti-abuse groups in Canada and the UK to remove the problematic links. After eight months of processing, LAION removed a total of 2,236 CSAM links from the dataset, including the 1,008 reported links, and released Re-LAION-5B as a 'clean dataset' that excluded these links.

In addition to removing the links, LAION also announced that it has instituted 'new safety standards.' According to LAION, illegal content was previously able to slip through its filters, but Re-LAION-5B has strengthened this filtering, and the majority of suspicious links have been removed.



LAION said, 'LAION-5B is designed based on crawl data up to September 2022, and Re-LAION-5B will not contain any new content other than the links already included in LAION-5B. Therefore, no new suspicious unchecked links will be mixed into the dataset. Re-LAION-5B has been checked against all CSAM links identified by collaborating institutions. Therefore, it can be used more safely by researchers.'



in Software, Posted by log1p_kr