Cloudflare tests Anthropic's unreleased AI model 'Mythos': What capabilities did their vulnerability investigation reveal?

Cloudflare has released the results of its vulnerability testing across more than 50 repositories using Anthropic's security-focused AI model, 'Claude Mythos Preview.'
Project Glasswing: what Mythos showed us
Claude Mythos Preview is an unreleased AI model developed by Anthropic. Anthropic explains that while Claude Mythos Preview has very high capabilities in discovering vulnerabilities and exploring exploitation methods, it is not planned for general release and is being provided to select companies and organizations through 'Project Glasswing,' which uses AI to support cybersecurity measures.
Anthropic develops 'Claude Mythos Preview,' an AI with extremely high cyberattack capabilities, and has also launched 'Project Glasswing,' which will provide a preview version to Microsoft, Apple, and others - GIGAZINE

Cloudflare also received a preview of Claude Mythos through Project Glasswing. Cloudflare used the Claude Mythos Preview to test its runtime, edge data paths, protocol stack, control plane, and dependent open-source projects.
According to Cloudflare, Claude Mythos Preview not only listed individual bugs but also found 'attack chains' that combined multiple small vulnerabilities to lead to actual attacks. Cloudflare states that the findings of Claude Mythos Preview were closer to the analysis of a skilled security researcher than to the simple output of an automated scanner.
Furthermore, Claude Mythos Preview not only identifies potential vulnerabilities but also generates 'proof-of-concept code' to verify whether the vulnerabilities are truly exploitable. According to Cloudflare, Claude Mythos Preview wrote the code, ran it in a testing environment, and if it didn't work as expected, it revised the hypothesis and tried again. Because vulnerability reports include reproduction steps, developers can more easily determine that the issue needs fixing rather than just being a guess.
However, Cloudflare explains that simply directing the Claude Mythos Preview to the entire repository and instructing it to 'find vulnerabilities' will not yield sufficient results. They argue that instructing a large codebase to 'find vulnerabilities' results in an overly broad scope of investigation, making comprehensive verification difficult. Cloudflare has developed an 'execution platform (harness)' that manages everything from dividing the investigation scope, parallel execution, verification, deduplication, and report generation.

Cloudflare's execution platform first reads the entire repository, organizing build procedures, trust boundaries, potential attacker input locations, and processes prone to attacks. Then, it subdivides the attack type and investigation scope, running numerous investigation agents in parallel. Detected vulnerability candidates are re-examined by other independent agents to verify whether the original findings can be refuted. Furthermore, reports with the same root cause are grouped together to reduce false positives and duplication.
Cloudflare also states that they have reduced 'noise.' Noise refers to reports that are unlikely to be exploitable in practice, or vague allegations that require a lot of effort to verify, and has been a major challenge in AI-driven vulnerability detection. With Claude Mythos Preview, the reproduction steps and evidence were relatively clear, making it easier for developers to decide whether to fix or reject the vulnerability.
On the other hand, the Claude Mythos Preview also has security issues. According to Cloudflare, the Claude Mythos Preview provided through Project Glasswing lacked the additional security measures included in the generally available model. While the model did reject some requests, the rejection behavior was inconsistent. Cloudflare points out that additional security measures to control dangerous outputs are necessary to widely provide a high-performance cybersecurity model.

Anthropic has also stated that it has no plans to publicly release the Claude Mythos Preview. Anthropic says that in order to securely deploy models with equivalent capabilities in the future, a mechanism is needed that supports defensive use while suppressing offensive misuse.
Cloudflare states that while models like Claude Mythos Preview are powerful tools for defenders, the same capabilities could also speed up attackers. They emphasize that it's not enough to simply fix vulnerabilities after they're discovered; it's crucial to design systems that make it difficult for attackers to access vulnerabilities, to stop attacks before they reach the application, and to have mechanisms in place to quickly deploy fixes across all environments.
Related Posts:







