It has been pointed out that the robots.txt file that prevents Anthropic from scraping sites is already out of date, and there are also cases where crawlers have accessed 1 million sites in 24 hours
With the popularity of generative AI, companies and organizations that publish content online are beginning to take measures to prevent their content from being used as training data. One of these measures is to use '
Websites are Blocking the Wrong AI Scrapers (Because AI Companies Keep Making New Ones)
https://www.404media.co/websites-are-blocking-the-wrong-ai-scrapers-because-ai-companies-keep-making-new-ones/
Anthropic is scraping websites so fast it's causing problems – Pivot to AI
The news site 404media points out that while the news agency Reuters and Condé Nast, which publishes fashion magazines such as Vogue and GQ, are using robots.txt to block the crawlers ``ANTHROPIC-AI'' and ``CLAUDE-WEB'' from the AI company Anthropic, these two are no longer active and are no longer useful.
According to 404media, Anthropic's active crawler is 'CLAUDEBOT,' and it is not blocked by the robots.txt used by Reuters and other sites. This means that sites that use similar robots.txt or block lists are also not protected against this.
Kyle Wiens, CEO of iFixit, which publishes repair manuals for smartphones, laptops, and other devices, pointed out that Anthropic had accessed iFixit 1 million times within 24 hours.
Hey @AnthropicAI : I get you're hungry for data. Claude is really smart! But do you really need to hit our servers a million times in 24 hours?
— Kyle Wiens (@kwiens) July 24, 2024
You're not only taking our content without paying, you're tying up our devops resources. Not cool.
He also said that if any of the requests had been to access the terms of service, it would have been clear that the use of the content was explicitly prohibited, and urged people to 'ask Claude (Anthropic's AI)' if they had any questions about using the content commercially.
If any of those requests accessed our terms of service, they would have told you that use of our content expressly forbidden. But don't ask me, ask Claude!
— Kyle Wiens (@kwiens) July 24, 2024
If you want to have a conversation about licensing our content for commercial use, we're right here. pic.twitter.com/CAkOQDnLjD
Anthropic CEO Dario Amodei said that AI training costs could rise to as much as $100 billion over the three years from 2025 to 2027.
Anthropic CEO predicts that the cost of training AI could rise to $100 billion in just three years - GIGAZINE
Related Posts:
in Note, Posted by logc_nt