It has been pointed out that the robots.txt file that prevents Anthropic from scraping sites is already out of date, and there are also cases where crawlers have accessed 1 million sites in 24 hours



With the popularity of generative AI, companies and organizations that publish content online are beginning to take measures to prevent their content from being used as training data. One of these measures is to use '

robots.txt ', which indicates which pages a site's crawlers can and cannot access, but in reality, the content intended to prevent Anthropic's crawlers is the name of a crawler that is no longer in use, so it is not preventing crawlers that are currently in use.

Websites are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones)
https://www.404media.co/websites-are-blocking-the-wrong-ai-scrapers-because-ai-companies-keep-making-new-ones/



Anthropic is scraping websites so fast it's causing problems – Pivot to AI

https://pivot-to-ai.com/2024/07/29/anthropic-is-scraping-websites-so-fast-its-causing-problems/

The news site 404media points out that while the news agency Reuters and Condé Nast, which publishes fashion magazines such as Vogue and GQ, are using robots.txt to block the crawlers ``ANTHROPIC-AI'' and ``CLAUDE-WEB'' from the AI company Anthropic, these two are no longer active and are no longer useful.

According to 404media, Anthropic's active crawler is 'CLAUDEBOT,' and it is not blocked by the robots.txt used by Reuters and other sites. This means that sites that use similar robots.txt or block lists are also not protected against this.

Kyle Wiens, CEO of iFixit, which publishes repair manuals for smartphones, laptops, and other devices, pointed out that Anthropic had accessed iFixit 1 million times within 24 hours.



He also said that if any of the requests had been to access the terms of service, it would have been clear that the use of the content was explicitly prohibited, and urged people to 'ask Claude (Anthropic's AI)' if they had any questions about using the content commercially.



Anthropic CEO Dario Amodei said that AI training costs could rise to as much as $100 billion over the three years from 2025 to 2027.

Anthropic CEO predicts that the cost of training AI could rise to $100 billion in just three years - GIGAZINE



in Note, Posted by logc_nt