Anthropic’s Claude Mythos is Too Dangerous to Launch

The company said it does not plan to release Claude Mythos Preview publicly due to safety risks but plans to develop safeguards for broader deployment of similar systems.

Share

Anthropic has developed Claude Mythos, an unreleased frontier model currently in preview that the company said can identify and exploit software vulnerabilities at a level comparable to or exceeding most human experts.

The company said the model has already identified “thousands of high-severity vulnerabilities,” including flaws in major operating systems and web browsers.

Anthropic has employed the model in Project Glasswing, a cybersecurity effort involving Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. The initiative focuses on identifying and fixing vulnerabilities in widely used software using advanced AI models.

Anthropic said the project is intended to deploy Claude Mythos Preview’s capabilities for defensive use before they become widely accessible. “Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes,” the company said in a blog.

The IPO-bound AI giant said the model’s competency stems from advances in agentic coding and reasoning, with benchmark scores showing higher performance than its earlier models across coding and cybersecurity tasks.

Claude Mythos Preview achieved a score of 77.8% on SWE-bench Pro, outperforming Opus 4.6, which scored 53.4%.

The company said it does not plan to release Claude Mythos Preview publicly due to safety risks but plans to develop safeguards for broader deployment of similar systems.

“We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring,” the company said.

“We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview,” it added.

As part of Project Glasswing, partner organisations will use the model to scan and secure both proprietary and open-source systems. Anthropic has committed up to $100 million in usage credits and $4 million in donations to open-source security groups.

Claude Mythos Preview will be priced at $25 per million input tokens and $125 per million output tokens for participating organisations, with access available via the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

The company said recent testing shows the model can autonomously detect and exploit vulnerabilities, including previously unknown flaws.

In one case, it identified a 27-year-old vulnerability in the OpenBSD operating system that allowed remote system crashes. It also found a 16-year-old issue in FFmpeg’s multimedia framework and chained multiple vulnerabilities in the Linux kernel to gain full system control.

Anthropic said all disclosed vulnerabilities have been reported and patched, while others will be revealed after fixes are deployed.

Industry participants said AI is changing the speed and scale of cybersecurity risks.

“AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure,” said Anthony Grieco, SVP and Chief Security and Trust Officer at Cisco, which is participating in Project Glasswing.

Amazon Web Services said it is already using the model in internal security operations. “We’ve been testing Claude Mythos Preview in our own security operations, applying it to critical codebases,” said Amy Herzog, Vice President and CISO at AWS.

Google said it will support the initiative through its cloud platform. “It’s always been critical that the industry work together on emerging security issues,” added Heather Adkins, VP of Security Engineering at Google.

Project Glasswing participants will focus on vulnerability detection, penetration testing, endpoint security, and software supply chain protection. Anthropic said it will publish findings within 90 days, including details on vulnerabilities and recommendations for evolving cybersecurity practices.

The company is also in discussions with US government officials and expects public- and private-sector collaboration to play a role in addressing risks from AI-driven cyber capabilities.

ALSO READ: OpenAI’s Codex is Now Available in Claude Code

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More