Adobe's image generation AI 'Firefly' training dataset contains about 5% of images generated by other image generation AIs such as Midjourney



Adobe's image generation AI ' Firefly ' is characterized by being trained on a library of photos and videos called Adobe Stock, and Adobe claims that it is a commercially safe AI, unlike other image generation AIs that learn by scraping images on the Internet. However, American economic newspaper Bloomberg reports that the dataset trained by Firefly actually contains images generated by Midjourney and other companies.

Adobe's AI Firefly Used AI-Generated Images From Rivals for Training - Bloomberg
https://www.bloomberg.com/news/articles/2024-04-12/adobe-s-ai-firefly-used-ai-generated-images-from-rivals-for-training



A huge amount of data is required to train image generation AI. Image generation AI such as Midjourney, DALL-E, and Stable Diffusion are trained on datasets composed of images collected from the Internet, but class action lawsuits have been filed against them for infringing the copyrights of creators.

Class action lawsuit filed against image generation AI 'Stable Diffusion' and 'Midjourney' - GIGAZINE



On the other hand, Adobe's Firefly is trained only on images with clear copyright issues, such as images owned by Adobe or images in the public domain. Adobe is confident in the commercial safety and transparency of Firefly, and has declared that it will provide legal compensation if it is sued for images generated by Firefly.

Adobe announces that it will fully compensate any lawsuits filed over images created with its image generation AI 'Firefly Enterprise Edition,' a sign of confidence that its AI does not infringe on copyrights - GIGAZINE



Adobe operates a stock of materials called 'Adobe Stock,' and images and videos registered in this Adobe Stock are used to train Firefly. Meanwhile, Adobe Stock began accepting AI content at the end of 2022, and at the time of writing, 14% of the total was tagged as AI-generated images.

Bloomberg reports that there were differences of opinion within Adobe from the early stages of development about training on a dataset that included image generation AI.

When Adobe Firefly was released as a beta version in March 2023, Raul Cerón, Adobe Stock Community Manager, said, 'When we release the official version instead of the beta version, we plan to prepare a new training database and exclude generated AI content.' However, according to Bloomberg, about 5% of the images used to train the enterprise version, Firefly's first commercial model, were generated by other image generation AI.



Brian Penny, who has registered work generated by Midjourney on Adobe Stock, said he was surprised to receive a bounty from Adobe. Adobe pays Adobe Stock creator bounties for the content used to train Firefly's commercial model, and the fact that Penny was paid means that his work was also used for learning. However, Penny said, 'Adobe needs to be ethical, be more transparent, and do more,' arguing that it is wrong to train Firefly with his own content generated by image generation AI.

Professor Rebecca Tushnett, a Harvard University legal scholar and expert on copyright and trademark, said, 'Even if Adobe's Firefly had learned from image-generating AI content, it would likely not be less secure in terms of copyright or trademark, and it would not have to disclose the content of its training unless it misleads consumers.' However, Professor Tushnett pointed out that the fact that it learned from images generated by Midjourney contradicts the idea that Firefly is different from other image-generating AI.

in Software, Posted by log1i_yk