Cloudflare launches 'AI Labyrinth' to trap AI crawlers in infinitely generating mazes

Cloud computing service
Trapping misbehaving bots in an AI Labyrinth
https://blog.cloudflare.com/ai-labyrinth/

Crawlers are bots used to scrape the internet for data to be used in training AI. Because crawlers collect all kinds of information from the internet, AI companies have been sued bycontent creators for using their content to train generative AI models.
To counter this trend, some AI companies offer options to prevent crawlers from using the data they collect for training AI. There is also a ' robots.txt ' file to block crawlers that perform scraping for AI training. However, since each AI company uses a different crawler and the names of crawlers are frequently updated, some companies may ignore the 'robots.txt' request.
Perplexity, a generative AI search engine, ignores crawler-preventing 'robots.txt' to extract information from websites - GIGAZINE

On March 19, 2025, Cloudflare announced 'AI Labyrinth' as a new approach to confuse and drain resources from 'crawlers that don't follow anti-scraping instructions.'
When AI Labyrinth detects a crawler that doesn't follow the no-scraping instructions, rather than blocking the crawler's request, it serves up a series of convincing AI-generated links to pages that the crawler will want to visit. Although this content looks authentic, it is AI-generated content, not content from a website protected by Cloudflare, so the crawler wastes time and resources.
To generate 'compelling, human-generated content,' AI Labyrinth uses the open-source model Workers AI to create unique HTML pages on a variety of topics. Rather than creating this content on-demand, the team implemented a pre-generation pipeline that sanitizes content to prevent cross-site scripting (XSS) vulnerabilities and stores it on Cloudflare R2 for faster retrieval.
In addition, it has been found that first generating a set of diverse topics and then creating content for each topic produces more diverse and compelling results. It is also important not to generate inaccurate content that leads to the spread of misinformation on the Internet, so 'the content you generate should be realistic and related to scientific facts, and not unrelated or unique to the site you are crawling,' Cloudflare explains.
Cloudflare will be making AI Labyrinth available to all users, including those on the free plan.
AI Labyrinth-generated content is seamlessly integrated into existing pages as hidden links through a custom HTML conversion process, without disturbing the original structure or content of the page. Each page generated by AI Labyrinth includes the appropriate meta directives to prevent search engine indexing and protect
The graph below summarizes the number of requests per day for each crawler. The horizontal axis represents time. The types of crawlers are AI scrapers (blue line), AI search (orange line), and AI assistant (green line). It is clear that only the requests by AI scrapers are increasing rapidly over time.

AI Labyrinth's approach is particularly effective in its role in a continuously evolving bot detection system. Because no human sees or clicks on the links created by AI Labyrinth, any click on a link is immediately known to be the work of a crawler. Cloudflare explained that this 'provides a powerful identification mechanism and also generates valuable data to feed into machine learning models.' By analyzing which crawlers follow which links, they can identify new bot patterns and signatures that may otherwise go undetected, allowing them to continually improve their response to abusive crawlers.
In addition, a method has been developed to trap a crawler in an AI-generated maze similar to AI Labyrinth.
'Nepenthes' is developed to trap crawlers that collect data for AI training in an infinitely generated maze - GIGAZINE

Related Posts:
in Software, Web Service, Posted by logu_ii