OpenAI announces a web crawler 'GPTBot' for improving future AI models, and at the same time, a blocking method to prevent unauthorized learning by AI is also released



Large-scale language models such as GPT-3.5 and GPT-4 respond to user questions and prompts by learning various content on the Internet. The web crawler `` GPTBot '', which OpenAI released technical documents etc. in August 2023, automatically acquires information from websites to which access is permitted, and GPT-4 and GPT-5 to be released in the future. It is said to be useful for improving the large-scale language model of

GPTBot - OpenAI API

https://platform.openai.com/docs/gptbot



OpenAI Launches GPTBot With Details On How To Restrict Access

https://www.searchenginejournal.com/openai-launches-gptbot-how-to-restrict-access/493394/



Now you can block OpenAI's web crawler - The Verge

https://www.theverge.com/2023/8/7/23823046/openai-data-scrape-block-ai

GPTBot: OpenAI releases new web crawler
https://searchengineland.com/gptbot-openais-new-web-crawler-430360

In August 2023, OpenAI released the web crawler ``GPTBot'', which is used to learn its AI products. It is suggested that learning by GPTBot may help improve the accuracy of AI models, general ability, and safety.

On the other hand, some users may not want the content of their site to be used without permission for OpenAI's AI-related products that will appear in the future. So OpenAI introduces a method to block crawling by GPTBot.



To completely block access to the site by GPTBot, add the following code to ' robots.txt ' in the directory.
[code]User-agent: GPTBot
Disallow: /[/code]



Also, to allow access to some content within the site, such as specific directories and files, make the following changes to robot.txt.
[code]User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/[/code]



In addition, OpenAI also publishes the IP address of the crawler used by OpenAI, including GPTBot, and it is possible to deny access by IP address.

Until now, OpenAI has been the subject of various discussions and lawsuits from the perspective of copyright and privacy, regarding learning using content on the Internet without asking for consent or warning users. It has become

OpenAI developed by ChatGPT is filed a class action lawsuit over AI learning data - GIGAZINE



``The publication of GPTBot has taken the first step in a complex debate over content ownership, fair use, and incentives for content creators,'' said the overseas media Search Engine Journal.

OpenAI said, ``Paid content, content containing personal information, and content containing text that violates our policy will be excluded from access by GPTBot and filtered, and will be used to improve new language models in the future.' said. He also said, 'By allowing GPTBot to crawl web pages, we can contribute to AI accuracy, improved privacy, and expanded possibilities.'

in Software, Posted by log1r_ut