Google announces option to prevent own website from being used for training of generated AI, some point out that it is already too late
In addition to developing
An update on web publisher controls
https://blog.google/technology/ai/an-update-on-web-publisher-controls/
Google adds a switch for publishers to opt out of becoming AI training data - The Verge
https://www.theverge.com/2023/9/28/23894779/google-ai-extended-training-data-toggle-bard-vertex
Your website can now opt out of training Google's Bard and future AIs | TechCrunch
https://techcrunch.com/2023/09/28/your-website-can-now-opt-out-of-training-googles-bard-and-future-ais/
Google introduces Google-Extended to let you block Bard, Vertex AI via robots.txt
https://searchengineland.com/google-extended-crawler-432636
Google has been developing Bard and various AI products for some time, but in July 2023, for the first time, it explicitly stated that it uses 'all information published online' to train Google's AI models. Did. Regarding this, a Google spokesperson told the technology media The Verge, ``Google's privacy policy requires that publicly available information from the open web be used to train language models for services like Google Translate.'' 'We've made it clear that we use it, and this update makes it clear that that includes new services like Bard.'
Google announces ``We will scrape everything published online for AI'' - GIGAZINE
In a new blog post on September 28, 2023, Google announced an option to allow your website to appear in Google search results but not be used to train Google's generative AI models.
'We are committed to developing AI responsibly, following AI principles and consistent with our commitment to consumer privacy,' said Google Vice President Daniel Roman. 'We're also hearing from customers that they need more options and control over how content is used in new generative AI use cases.'
Going forward, website operators can ensure that their websites are used to train Google's Bard and other generative AI models by including the following elements in their robots.txt files , which control access by search engine crawlers : It is possible to prevent.
[code]User-agent: Google-Extended
Disallow: /[/code]
'Making simple, scalable controls like Google-Extended available through robots.txt is an important step in providing transparency and control, and is available to all providers of AI models,' said Roman. I think we should do that.'
TechCrunch, a technology media, focuses on the fact that Google does not use the word 'train' in its blog articles. Of course, even without the word 'train,' it's clear that Google is using content from the web to train its AI models, but Google seems to want to avoid giving that impression.
Additionally, the blog posts consistently use phrases such as 'We will help improve the Bard and Vertex AI generation APIs' and 'We will help these AI models become more accurate and capable over time.' It is being TechCrunch points out that by doing this, the focus is shifted from ``whether Google will use the content'' to ``whether users will support Google.''
TechCrunch notes that while this announcement appears at first glance to be 'Google giving users an ethical option,' in reality, Google is already using content from the web to train its AI models. pointed out. 'The truth revealed by this action is that after Google exploits its unfettered access to data on the web to get what it wants, is consent and ethical data collection a priority?' 'It's asking for permission after the fact to make it look like Google is. If consent and ethical data collection were truly a priority, this setting would have been in place for years.' I criticized his approach.
Related Posts:
in Software, Web Service, Posted by log1h_ik