Image generation AI users discover ``photos of their medical records'' from AI learning datasets

An artist who creates art works using the image generation AI '

DALL E2 ' etc., took a photo taken when he was being treated at a hospital from among the photos provided as a learning dataset for AI. reported to have found From this, it is once again highlighted that it is extremely difficult to erase data that has once leaked to the Internet.

Artist finds private medical record photos in popular AI training data set | Ars Technica

AI artist Lapine posted on Twitter on September 17, 2022, ``My face was included in the LAION dataset . He passed away in 2009, but the photo must have leaked somewhere on the internet and entered the dataset.' His tweet is accompanied by a photo of a consent form authorizing its use as a medical record.

The dataset in question is a dataset called 'LAION-5B' created by collecting more than 5 billion images published on the Internet. Mr. Lapine accidentally found his face photo while using the site 'Have I Been Trained?', Which allows you to check whether your work is included in LAION-5B. The following article explains what kind of service 'Have I Been Trained?' Is.

``Have I Been Trained?'' that allows you to search whether your work has been used arbitrarily for image generation AI-GIGAZINE

Lapine told tech news site Ars Technica that he suffers from a genetic condition called hypokeratosis congenita that affects all parts of the body, including skin, teeth and bones. . As part of the treatment, Mr. Lapine underwent surgery to reconstruct the contours of his face, but the photograph of the face taken by the surgeon at that time was stolen by someone after the doctor's death and leaked on the Internet, Lapine believes it may have been collected by LAION-5B.

When Ars Technica compared the photos and records provided by Mr. Lapine, it was confirmed that the 'LAION-5B' data set did indeed contain photos of Mr. Lapine's medical records. The photo was not associated with his real name, but a search of the photo also found thousands of what appeared to be medical records of other patients, some of which had ethical and legal legitimacy. Some were questionable.

Ars Technica points out that these photos may be part of popular image generation services offered commercially by Midjourney and Stability AI.

Lapine said of the unauthorized circulation of photos of her medical records and their use in AI training, 'It's bad enough that the photos were leaked, but now they're part of the product. 'This can happen to anyone's photo, whether it's a medical record or not, and it's very likely that these photos will be exploited in the future.'

According to Ars Technica, LAION is a data set that summarizes the URLs of images on the web, so LAION does not directly hold the image itself. So when Lapine asked LAION about how to remove his images from the dataset, he was told, 'The best way to remove an image from the internet is to ask the site hosting the image to stop doing it.' We do not host any of these images.'

In the United States, a ruling was issued in 2019 that web scraping is legal to collect and summarize data posted on the Internet, so it is difficult to directly ask a service like LAION to delete images. It is said As an unavoidable measure, LAION suggested creating and distributing a list of URLs they don't want you to use, and asking each AI trainer to blacklist the image.

Regarding the future of technologies such as image generation AI, Ars Technica said, ``It is becoming clearer day by day that creative tools equipped with AI are inevitable technological advances. The question still remains: Whoever uploaded their images to the internet ten years ago, or whose images were illegally uploaded, silently accepts that their data will be used to train the AI of the future? Is it ethical to expect that? If the answer is 'no', does it matter?'

in Web Service, Posted by log1l_ks