Jun 05, 2026 15:10:00

Anthropic warns of the risks of a self-improvement loop where 'AI creates AI,' and discusses the possibility of AI itself accelerating AI development.

In software development, engineers are increasingly having AI write code. Anthropic argues that this trend is spreading to AI development itself, and has officially discussed the risks of 'recursive self-improvement,' where AI designs the next generation of AI to create even more powerful AI.

When AI builds itself \ Anthropic

https://www.anthropic.com/institute/recursive-self-improvement

When Anthropic was founded in 2021, humans were writing the code and documentation for Claude.

When Claude was released to the public in 2023, Anthropic employees also began using Claude for development. In its early stages, the chatbot was used to assist with parts of the process, such as creating short code snippets.

Anthropic employees completed Claude Code in 2025, utilizing the code generated by Claude. Claude Code allows users to write and edit their own code, and even rewrite entire files as needed, leading to its widespread use as a 'coding agent.'

Claude Code has also accelerated the development of Claude. As of June 2026, Claude will be able to execute code directly and delegate several hours' worth of work to other agents.

The Anthropic research team believes that if this evolution continues, Claude will eventually acquire the ability to build and train its own models, completing a self-improvement loop by AI.

In fact, the rate of improvement in AI model performance is accelerating. The length of tasks that AI can autonomously and reliably complete increased from approximately 4 minutes in March 2024 to approximately 90 minutes in February 2025, and reached approximately 720 minutes in February 2026. Furthermore, Anthropic states that 'it may be possible to handle tasks lasting several days by 2026.'

Changes are also underway within Anthropic. Below is a graph showing the change in the amount of merged code per active contributor at Anthropic. Compared to the average from 2021 to 2024, it increased dramatically after Claude Code was released as a research preview in February 2025, reaching eight times the previous amount in the second quarter of 2026. As of May 2026, more than 80% of the code incorporated into Anthropic's codebase was written by Claude.

However, an increase in code volume does not necessarily translate directly into improved quality. Anthropic also explains that the number of lines of code is an indicator of quantity, not quality, and that the eight-fold increase should not be directly interpreted as a measure of productivity improvement. Nevertheless, it can be seen that engineers are shifting from writing the code themselves to delegating tasks to Claude and then reviewing the results.

As the amount of code written increases, the next problem is whether it actually works. Even if the code works, if bugs or security issues remain, the increase in development speed directly translates to an increase in risk. That's why Anthropic also investigates how successful Claude Code sessions are.

The graph below shows the session success rate of Claude Code within Anthropic. For simple and routine tasks, the success rate has risen to the high 80s, and even for open-ended problems with unclear specifications, it reached 76% by May 2026.

Anthropic also showcases an example of how Claude handled a problem with an unclear specification. On one occasion, tens of thousands of training jobs crashed during a routine upgrade. Engineers explained the situation to Claude, who was given access to the computing cluster. Claude then investigated the running jobs and isolated the cause by testing environment settings one by one. As a result, he identified the setting causing the crash. He completed an investigation that would have taken a human two to three days in about two hours.

It's also important that AI can not only write code but also run experiments. Every time Anthropic releases a model, it passes the code that trains a small AI model to Claude and tests it to improve execution speed while maintaining correctness. Claude Opus 4, released in May 2025, sped up the original code by an average of about 3 times, but Claude Mythos Preview, released in April 2026, reached about 52 times.

Furthermore, Anthropic is also investigating whether AI can determine the very course of research. In research, the ability to write code alone is insufficient. When an experiment doesn't go well, it's necessary to make judgments such as whether to formulate a different hypothesis, continue with the current approach, or move on to a different problem.

The graph below shows the results of an analysis of 129 internal sessions using Claude Code by Anthropic researchers. The analysis extracted instances where researchers took detours, had each model suggest 'what to do next,' and then compared the human decision with the AI's suggestion using another Claude that knew the final outcome. In November 2025, Claude Opus 4.5 provided better suggestions than the human next move 51% of the time. In April 2026, with the Claude Mythos Preview, this percentage rose to 64%.

It should be noted that AI has not yet reached the level of completely replacing human researchers. As of June 2026, Claude's strengths lie in 'writing code, running experiments, and summarizing results according to goals given by humans.' Anthropic points out that humans still have the role of making judgments such as 'which research topic to pursue,' 'which results to trust,' and 'when to abandon a dead-end approach.'

However, as AI becomes capable of conducting experiments and writing code at high speed, human review and judgment become a bottleneck in the entire development process. Since the overall speed is constrained by the slowest stage, even if Claude generates a large amount of code and experimental results, development will stall if humans cannot review it all. According to Anthropic, code reviews have already become a new constraint within the company.

Anthropic identifies three possible future scenarios: a future where AI capabilities plateau; a future where AI companies continue to improve efficiency through AI while humans continue to determine research directions; and a future where AI begins to create successor models of itself. The biggest risk they highlight is the scenario where a loop of AI creating AI is completed. If AI develops the next generation of AI, and that AI then develops the next generation, the rate of improvement in capabilities may surpass human oversight, potentially making it impossible to keep up with safety checks and problem detection.

On the other hand, Anthropic is not arguing that AI development should simply be stopped. If only cautious companies halt development while less cautious organizations and governments move forward, safety may actually decrease. Therefore, they state that a mechanism is needed to ensure that multiple AI development organizations agree on the same conditions for slowing down or pausing, and that this is actually being followed.

Establishing a mechanism to confirm the halt of large-scale AI development is not easy. AI training is not as easily detectable from the outside as a missile base, and the computing resources and data overlap with those used for general purposes. Anthropic explains that a reliable pause requires defining what triggers the halt, what conditions must be met for it to resume, and who will make those decisions.

Anthropic plans to engage in dialogue with policymakers, researchers, other AI companies, and civil society to explore rules and forms of collaboration necessary for an era where AI creates AI.

Related Posts:

Jun 05, 2026 15:10:00 in AI, Posted by log1d_ts