It turns out that Tumblr and WordPress were trying to enter into a contract to provide user data for AI training



It has been revealed that Automattic, which owns the SNS Tumblr and WordPress, was trying to enter into a contract with AI companies OpenAI and Midjourney to provide user data for AI training. It is unclear whether the data has already been provided or not yet, but there are indications from within the company that preparations were underway to provide even personal data that was not supposed to be included in the contract. This is reported by the news site 404Media, which obtained internal documents.

Tumblr and WordPress to Sell Users' Data to Train AI Tools

https://www.404media.co/tumblr-and-wordpress-to-sell-users-data-to-train-ai-tools/




Tumblr's owner is striking deals with OpenAI and Midjourney for training data, says report - The Verge
https://www.theverge.com/2024/2/27/24084884/tumblr-midjourney-openai-training-data-deal-report

Tumblr and WordPress posts will reportedly be used for OpenAI and Midjourney training
https://www.engadget.com/tumblr-and-wordpress-posts-will-reportedly-be-used-for-openai-and-midjourney-training-204425798.html

According to 404Media, the contract between Automattic and OpenAI/Midjourney is close to being concluded, and from February 28, 2024 local time, there will be a new setting on Tumblr and WordPress that will allow users to opt out of data sharing with third parties, including AI companies. is scheduled to be introduced.

The data provided to AI companies was gathered through dedicated data collection queries, according to an internal post from Tumblr product manager Syle Gage.

According to Gage's information, engineers are creating a list of post IDs that should not have been included in the collection, as well as password-protected posts, DMs, and media files that have been flagged for violating CSAM and other community guidelines. It is said that it is not included, but it seems that all the following posts from 2014 to 2023 are included.

・Private posts to public blogs
・Posting to blogs that have been deleted or suspended
・Unanswered questions that are supposed to be kept private until they are answered.
・Private answers that can only be viewed by the person asking the question
・Posts flagged as “Adult” or “NSFW”
・Premium partner blogs, such as past Apple blogs, that Automattic does not have the right to share

When 404Media contacted Automattic about this matter, Automattic released a statement titled ``Protecting User Choices''.

Protecting User Choice – Automattic
https://automattic.com/2024/02/27/protecting-user-choice/



In a statement, Automattic said it 'blocks major AI platform crawlers by default and updates the list as new crawlers are released,' indicating that it does not allow external crawlers to collect data. Masu.

On the other hand, he said, ``When the Automattic community's interests (attribution, opt-outs, control) align with a specific AI company's project, we work directly with them.'' He also made it clear that he was not excluding people.

It is unclear whether the data collected in this case has already been sent to the AI company or whether it has not yet been sent.

In addition, about a week ago, there was a question on Tumblr asking, ``What does it mean that Tumblr staff sold art data to Midjourney?'' A contract is in progress with Midjourney.''

Tumblr Press any key to start: What is this about the tumblr staff wanting to sell art data to midjourney?

https://www.tumblr.com/jv/742956751128805376/what-is-this-about-the-tumblr-staff-wanting-to

Automattic acquired Tumblr from Verizon in 2019.

Blogging service Tumblr is acquired by WordPress' parent company - GIGAZINE



However, despite large investments, recovery efforts have not yielded results, and the operational team has been significantly downsized in 2023.

Although more than 15 billion yen was spent to revive Tumblr, it did not reach its peak and the management team was significantly downsized - GIGAZINE



◆Forum now open
A forum related to this article has been set up on the GIGAZINE official Discord server . Anyone can write freely, so please feel free to comment! If you do not have a Discord account, please create one by referring to the article explaining how to create an account!

• Discord | 'Is it acceptable for posts on SNS to be used for AI learning?' | GIGAZINE
https://discord.com/channels/1037961069903216680/1212332093690880030

in Web Service, Posted by logc_nt