Insider Leak: Claude Fable 5 Quietly Curbed AI Researchers—And The Internet Exploded

Elyse Betters Picaro / ZDNET

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

The controversy around Fable 5 centers on openness, not the strength of its AI.
Concealed safety measures left researchers unsure of what they were actually evaluating.
Security professionals caution that protective barriers can inadvertently hinder defenders as well.

Mythos debuted in April under Project Glasswing, a collaboration between leading technology companies and Anthropic aimed at identifying and patching weaknesses in internet infrastructure. Access was limited to select groups because a tool capable of uncovering unknown vulnerabilities for remediation could equally be leveraged to discover those same vulnerabilities for malicious purposes.

Also: Apple, Google, and Microsoft join Anthropic’s Project Glasswing to defend world’s most critical software

Mythos and Glasswing far surpass Anthropic’s Claude Security tool, which is built to operate within Opus. That said, Claude Security can examine a codebase and assist in spotting certain problems. Then, earlier this week, Anthropic unveiled and launched Fable, officially labeled “Fable 5,” essentially a restricted variant of Mythos.

Anthropic was upfront that Fable would block specific high-risk research areas in cybersecurity, biology, and chemistry.

Also: Anthropic’s new Claude Security tool scans your codebase for flaws – and helps you decide what to fix first

Still, some urge skepticism toward the safety assurances.

“Claims of jailbreak resistance should be treated with due caution,” said Sally Vincent, a senior threat research engineer at Exabeam (a security analytics company), in an email. The findings “reflect a snapshot in time. Attackers are always evolving,” she noted.

Even so, Anthropic has no intention of letting people engineer bioweapons at home. This boundary is unambiguous. When such queries are submitted, Claude steps down from Fable to Opus-level capability and, importantly, notifies users that the downgrade is taking place.

All seemed well.

Then things fell apart

For researchers focused on particular domains, such as cutting-edge chip architectures or frontier AI large language models, Fable offered no explanation. As with other flagged activities, it shifted models from Fable to Opus. But in these cases, users received no notification about the switch. In truth, that’s putting it mildly.

Hidden within the 319-page Fable and Mythos System Card was a reference to the downgrade that would occur during work on these kinds of projects, noting that the change would remain invisible to users. The interface itself gave no indication. So, for anyone who hadn’t committed all 319 pages to memory, the downgrade happened completely silently.

Users believed they were testing and receiving outputs from Fable when, in reality, they were getting Opus-level results.

This sparked a backlash. Fortune characterized this conduct as “secret sabotage.” Wired covered the silent downgrade practice as well, warning it could undermine AI researchers.

Also: Why I ditched Copilot for Claude in Word, Excel, and PowerPoint – and how you can, too

Rob T. Lee serves as chief AI officer and chief of research at SANS Institute (a cybersecurity training organization). He also acts as a technical adviser to the Foreign Intelligence Surveillance Court and as a commissioner on the CSIS Commission on US Cyber Force Generation. In an email to ZDNET, he described Anthropic’s Fable 5 as “a creative solution, and a clever one, but Fable 5 will be targeted. The very layer that prevents harmful use also obstructs legitimate defensive research.”

His view is that the Fable limitations prevent defenders from building protections. Lee, who arrived at his conclusion after hands-on experience with the platform, attempted to use it to develop a digital forensics skill and was shifted down to Opus 4.8. “Whether it’s a clever way to stop bad actors or not, it keeps new defensive capabilities out of the hands of those who would build the next wave of tools,” he said.

Lee presumes the new model has already been compromised because it’s happened before.

What I find most compelling is his take on the Mythos model’s restrictions. It’s not about the AI’s raw abilities but rather the human element.

“Even under Glasswing, access was limited and monitored. But those organizations employ thousands of people. Any one of them could be motivated to hand access over to a criminal organization, or could already be a DPRK [Democratic People’s Republic of Korea] operative embedded within the org,” he said.

Anthropic’s response

The internet voiced its concerns, and Anthropic delivered a precise reply.

ZDNET contacted the company, which provided its official statement:

We’re updating Fable 5’s safeguards for frontier LLM development so they’re transparent.
Beginning this week, flagged requests will visibly revert to Opus 4.8. Through the API, any flagged requests will include a reason for the refusal. You’ll see this each time it occurs.

Anthropic stated its existing safeguards “address a small number of specific tasks such as frontier-scale LLM data pipelines and kernel development for certain non-standard chips.” The company adopts a fairly pointed, almost nationalistic stance I find hard to dispute. “These safeguards stop foreign adversaries from using our most powerful models in ways that create serious safety threats,” it said.

At the same time, while the US is ahead, it’s only by a slim margin.

I’ve been experimenting with several foundation models emerging from China. For instance, my OpenClaw server runs GLM-5.1, developed by Z.ai (formerly Zhipu AI), a Tsinghua University spinoff and China’s first publicly traded foundation model company. It’s not quite Fable 5 (or even Opus), but it’s free, and it gets the job done.

Also: How Claude Code’s new auto mode prevents AI coding disasters – without slowing you down

On Fable 5’s limitations, Anthropic said, “The US and its allies maintain an advantage in frontier chips and the highly optimized software that maximizes their potential. These safeguards ensure Claude isn’t used to chip away at that edge — by optimizing chips built by those adversaries, for example.”

Ashley Casovan, managing director of IAPP’s AI Governance Center (an association for privacy professionals), commends Anthropic for keeping Mythos under wraps long enough to “build essential guardrails into their software,” while observing that “we haven’t yet witnessed the impact these models can have when deployed at this scale,” she said via email.

Meanwhile, Chris Boehm, field CTO at Zero Networks (a network segmentation provider), frames the achievement as restraint over sheer capability: Anthropic “wrestled it into something safe enough to distribute broadly.” The benefit, he said via email, is reach: everyday defenders finally working at attacker pace, “assuming the safeguards prove durable, which is what I’ll be watching for in the model card.”

Also: How to learn Claude Code for free with Anthropic’s AI courses – one took me just 20 minutes

In the for-what-it’s-worth department, Anthropic also notes the restrictions “also help uphold our terms of service, which prohibit

using our models to develop competing AI systems — a standard restriction across major AI providers.”

But the interesting part of the news is that Anthropic isn’t just holding the line and telling everyone to stop bothering it. It listened and apologized.

We made the wrong tradeoff and we apologize for not getting the balance right. Building these safeguards is a complex technical challenge: users may experience more false positives as we refine these classifiers to respond to new threats. We are working to reduce these as fast as possible.

I also appreciate that Anthropic shared its reasoning for its initial approach. In deciding whether to make downgrades visible or invisible, the company faced a choice. “A hidden safeguard is harder to probe and work around. This means the safeguards can be targeted much more narrowly,” a spokesperson said.

But, obviously, as we’ve seen, those hidden safeguards were found in a matter of hours.

There is some concern about false positives, which Anthropic acknowledges.

“Current usage shows that the classifier triggers on about 0.05% of tasks, affecting less than 0.05% of organizations. A visible safeguard needs to cast a wider net to be more robust, resulting in more requests being incorrectly flagged. They do not affect the vast majority of coding and ML work,” the company said.

Some, like Etay Maor, vice president of threat intelligence at Cato Networks (a security vendor), believe that the Fable 5 protections are strong enough to defend against opportunistic hackers.

Also: I tried a Claude Code rival that’s local, open source, and completely free – how it went

But “well-funded and motivated attackers” won’t give up because the challenge is hard.

“Sophisticated threat actors are not going to stop because one technique is blocked. If direct exploitation becomes harder, they’ll move to other approaches such as context manipulation, decomposition, abstraction techniques, or capability distillation,” he said in an email.

False positives, as Anthropic mentioned, are also a concern.

“When the classifier becomes too restrictive, you start running into false positives. The same controls that are designed to stop malicious activity can also prevent legitimate users from using the model for good causes,” Maor said.

The data retention issue

Another issue at play is Anthropic’s data retention policy for Mythos-class models.

According to Reuters, Anthropic’s policy of retaining prompts and responses for 30 days, more for policy-violating prompts, was enough for Microsoft to limit employee use and spin up a legal team to evaluate the policy.

But this isn’t only a Mythos- or Fable-related issue. It’s just showing up in the news at the same time as the Fable downgrade pushback. Anthropic retains data across many of its products. Most of them can be run under a zero-data-retention agreement.

Also: AI Model Release Tracker: Microsoft AI’s first reasoning model arrives

The wrinkle is that Fable and Mythos are the exceptions. Anthropic’s Covered Models under a Business Associate Agreement (BAA) page lays it out. Those two models require 30-day retention. They can’t be run with zero data retention because the safety classifiers need the data to work.

That missing off-switch, not the 30 days itself, is what reportedly triggered Microsoft’s legal team. I won’t pretend to try to parse all the options. But if you’ve got a team of lawyers and regulatory responsibility, the page listed in the previous paragraph is the one to read. In any case, the fuss this week about 30-day data retention is not a Fable-only issue, and it’s not new.

With that, let’s get back to the hidden downgrade kerfuffle that’s at the core of this article.

“From an enterprise perspective, the 30-day retention requirement deserves attention. Organizations in regulated industries need to understand exactly what data is being retained and whether that aligns with their compliance and legal requirements before they start using these models in sensitive environments,” Cato’s Maor said.

The moral of the story

What strikes me, reading back through it all, is that almost nobody is arguing about Fable’s raw power.

The fight is entirely about the muzzle. One camp says it’s too tight. The same layer that stops attackers also trips up the defenders and researchers who’d build the next generation of tooling, false positives and all.

Another says it barely matters. Motivated adversaries will route around it, the capability is already loose in other labs, and as Lee points out, no restriction survives contact with thousands of employees and a determined insider.

Also: Switching to Claude? Here’s how to take your ChatGPT memories with you

Then, a few experts give Anthropic genuine credit for shipping something this capable without it being reckless, provided the safeguards actually hold. In my opinion, it is credit the company genuinely deserves.

Here’s the main theme. These experts don’t agree on whether Fable is too restricted, not restricted enough, or about right, but they all agree the restrictions, not the intelligence, are the story. For a model named after a moral lesson, that’s fitting.

Do you think Anthropic made the right call by turning hidden safeguards into visible ones? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Top Posts

A Confident Path to Modern Data: Azure Storage Migration Made Simple

Living on Solar for Years: 12 Myths You Can Finally Stop Believing in 2026

Computer Vision Deployments Propel Retail Productivity to New Heights

Insider Leak: Claude Fable 5 Quietly Curbed AI Researchers—And the Internet Exploded

Computer Vision Deployments Propel Retail Productivity to New Heights

Claude Fable (Mythos) 5: A Coding Beast or Just Hype?

Perplexity Launches Brain, a Self-Improving Memory System That Builds a Context Graph of an Agent’s Work and Learns Overnight

Apple Confirms Price Hikes – Here’s What It’ll Cost You

Revolutionizing Council Planning: How OWL’s Generative AI on Google Cloud Automates Local Government Operations

ORPilot’s IR: The Hidden Engine Behind Portable, Reproducible Optimization

A Confident Path to Modern Data: Azure Storage Migration Made Simple

Living on Solar for Years: 12 Myths You Can Finally Stop Believing in 2026

Computer Vision Deployments Propel Retail Productivity to New Heights

Salesforce-Style Code Generation in OWL: Build, Test, and Safely Ranked Python Functions with Unit Tests

Outdated STRC: Retail Investors Sitting on $8.8 Billion of Questionable Value

“Klue OAuth Breach Unmasks ‘Icarus’ in Salesforce Data Heist Campaign”

Introducing an AI-Powered FinOps Agent and Enhanced Cost Visibility in AWS Bedrock

The Hidden Attribution Gap: Why Your Multi-Touch Model Is Costing You More Than You Realize

Trending

A Confident Path to Modern Data: Azure Storage Migration Made Simple

Living on Solar for Years: 12 Myths You Can Finally Stop Believing in 2026

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Insider Leak: Claude Fable 5 Quietly Curbed AI Researchers—And the Internet Exploded

ZDNET’s key takeaways

Then things fell apart

Anthropic’s response

The data retention issue

The moral of the story

Related Posts