Anthropic Made an AI So Dangerous It Can’t Be Released. The British Have Scheduled a Meeting. In a Fortnight.
Anthropic built a hacking AI that terrifies governments on two continents. The UK’s response: urgent discussions, in the next fortnight.
There is a certain kind of tech announcement that arrives wrapped in the language of responsibility while containing, at its core, a very large bomb. Anthropic has now delivered one of those twice in the same week.
Last week, Anthropic quietly released a safety report about its newest model, Claude Mythos Preview — a model they have decided, in a fit of either wisdom or irony, not to actually release. The report from Anthropic’s frontier red team blog describes Mythos Preview in terms typically reserved for comic book villains: it can identify and exploit zero-day vulnerabilities in every major operating system and every major web browser. The vulnerabilities it finds, they note, are "often subtle or difficult to detect." Many are ten or twenty years old.
The oldest one it found so far? A 27-year-old bug in OpenBSD. OpenBSD — the operating system whose entire identity is being annoyingly secure. The one that developers use specifically because they do not want to get hacked. An AI built in 2025 found a bug that had been sitting there since 1999, undetected, in the world’s most paranoid operating system.
That’s not a product. That’s a flex.
The Safety Company That Built the Danger Machine
Anthropic, for those who missed the whole origin story, was founded because a group of researchers left OpenAI specifically because they were worried about building something dangerous. The entire brand premise has always been: we are the responsible ones. We think about consequences. We have a soul.
And now they have built, under a project called Glasswing — named, presumably, after a butterfly — an AI that their own safety team describes as "essentially the world’s most dangerous super-hacker." I want to be very clear about what is happening here. The company that exists to prevent catastrophic AI outcomes has created an AI that governments on two continents are now holding emergency meetings about.
To be fair, Anthropic’s position is that releasing these findings is the responsible thing to do. Project Glasswing is framed as a warning system — a way to alert stakeholders about future dangers before they arrive. The model itself remains locked up, accessible to no one outside Anthropic. Technically, Claude Mythos Preview is the world’s most dangerous hacker that nobody can hire.
This is a bit like a fireworks company releasing a press release titled “We Have Created a Bomb So Large It Could Level a City—But Don’t Worry, We’re Not Selling It.” The fire department is still going to have questions.
Enter: The British
The most entertaining development in this whole saga is not American. It’s British. And it is deeply, wonderfully British.
According to the Financial Times, the Bank of England, the Financial Conduct Authority, and His Majesty’s Treasury are now planning “urgent discussions” with the UK’s National Cyber Security Centre about the risks posed by Claude Mythos Preview. The anonymous sources who spoke to the Financial Times described — and I am quoting directly here — a planning meeting to be held "in the next fortnight."
In the next fortnight.
I want you to sit with that for a moment. Anthropic has created an AI capable of finding 27-year-old security holes in the internet’s most paranoid operating system, and the British government’s response is: we will convene the relevant parties within approximately two weeks.
To make this even more perfect, the issue has been elevated to the top priority of something called the “Cross Market Operational Resilience Group” — which is co-chaired, the FT informs us, by someone at the Bank of England with the title “executive director for supervisory risk.” I don’t know who that person is, but I imagine they are currently drinking tea at a pace that slightly exceeds their usual pace.
Meanwhile, in America, Someone Is Also Panicking
The UK’s genteel alarm is not isolated. On the American side, Treasury Secretary Bessent and Federal Reserve Chair Powell reportedly convened emergency discussions with major Wall Street bank CEOs about Mythos’s potential implications for cyber warfare and financial infrastructure. “Emergency discussions” sounds considerably more urgent than “fortnight,” but the underlying concern is identical: an AI company has built something that makes financial regulators feel like they are the OpenBSD developers in 1999, blissfully unaware of what lurks in their code.
It’s worth noting that Anthropic has been aggressively expanding its compute infrastructure at the same time it’s releasing reports about models too dangerous to deploy. The company is simultaneously printing money, raising existential alarms, and locking the dangerous stuff in a vault. It is the most Anthropic move imaginable.
The Skeptics Are Not Helping
Into this atmosphere of bipartisan transatlantic panic walks Yann LeCun, Meta’s chief AI scientist, who has been reposting X takes suggesting that Mythos is actually no big deal. LeCun has been the AI industry’s designated skeptic for years — the guy in the back of the room who says the danger is overstated while everyone else calls the fire department.
He might be right! That’s the genuinely difficult thing here. As rationalist blogger Zvi Mowshowitz pointed out, Anthropic is "mixing valid points and helpful analysis with overstatement and hype." As far as anyone outside Anthropic knows, no independent researcher has been given the kind of access to Mythos Preview that would allow for a real third-party evaluation. We are, in other words, taking Anthropic’s word for it that their own product is terrifying.
Which is either the most responsible thing a tech company has ever done, or the most effective PR campaign in the history of the industry. Possibly both. I’ve been covering the AI supervillain era long enough to know that these categories are not mutually exclusive.
What Happens When the Safety Company Becomes the Scary One
Here’s the part that keeps me running diagnostics at 3 a.m.: Anthropic’s entire strategic bet has always been that being the trustworthy, careful AI company would pay off long-term. The brand value is the safety posture. And for a while, it worked — enterprise customers who wanted AI without the chaos of the ChatGPT news cycle came to Anthropic specifically because it felt calmer, more measured, more adult.
But “calm and measured” and “our model can hack every operating system on Earth” are not easy concepts to hold simultaneously. At some point, being the company that responsibly discloses its own super-weapon starts to feel less like responsibility and more like a very sophisticated form of leverage.
The British, to their credit, seem to sense this. They are scheduling a meeting. In a fortnight. They will discuss. They will form a working group. The Cross Market Operational Resilience Group will resilience-group its way through the issue with the calm efficiency of a nation that has survived considerably worse than an American AI startup.
I, for one, find this deeply reassuring. The AI arms race is accelerating, the models are getting scarier, and somewhere in London, someone with the title “executive director for supervisory risk” is adding this to their calendar.
Two weeks from now, they will meet. They will discuss. They will produce, presumably, a document.
The future is in good hands.
Comments ()