Tech

Anthropic’s Fable model draws ire over rigid cybersecurity guardrails

Security experts criticise keyword-based triggers that flag innocuous prompts, though the company maintains the restrictions are necessary to prevent malware and biological weapon development.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Hacker News · original

Artificial Intelligence Research

Related coverage

Explore Artificial Intelligence coverage Explore Research coverage More from the Tech desk

Tech

No image available

Researchers say public release of Mythos iteration blocks routine code reviews and benign tasks

Cybersecurity researchers have voiced strong dissatisfaction with the safety protocols surrounding Anthropic’s latest release, Fable, which launched on 10 June 2026. Marketed as a public and limited iteration of the powerful Mythos model, Fable employs stringent guardrails that automatically reject requests tangentially related to cybersecurity or biology. While the intent is to mitigate risks associated with malware development and biological weapon creation, experts argue the implementation is overly broad and hampers legitimate professional workflows.

Valentina “Chompie” Palmiotti, a security researcher at IBM X-Force, highlighted that the system rejects any request with even a remote cyber-related connection. She noted that innocuous activities, such as reading a blog post, are flagged. When a prompt triggers these safety measures, the chat pauses and displays a message stating that “safety measures flagged this message for cybersecurity or biology topics.” The model is programmed to fall back to Claude Opus 4.8 in such instances.

Matt Suiche, a cybersecurity veteran at Tolmo, described the restrictions as haphazard and heavily reliant on keyword-based triggers. He explained that the system assumes requests to write secure code are cybersecurity-related work rather than standard software engineering best practices, resulting in users being “downgraded.” Suiche noted that anything within the lexical field of ‘cybersecurity’ tends to activate the guardrails, affecting tasks like code reviews.

Anthropic introduced these measures to address long-standing concerns about its models being used to compromise software or develop harmful tools. The underlying Mythos model was initially released in April under “Project Glasswing,” restricted to a limited number of companies. Last week, Anthropic expanded access to Mythos to hundreds of organisations across 15 countries. However, the public-facing Fable model retains strict limitations to ensure safety during this broader rollout.

Despite the criticism, some experts view the restrictions as a necessary precaution for early-stage deployment. Suiche suggested that it is better to “catch more people than not enough” during initial releases, with the expectation that guardrails will evolve as companies collaborate more closely with cybersecurity firms. For approved professionals, Anthropic operates a Cyber Verification Program that grants fewer limitations, a approach similar to OpenAI’s Trusted Access for Cyber.

Anthropic’s Fable model draws ire over rigid cybersecurity guardrails

More from Tech

Xbox executives declare financial margins unsustainable as layoffs loom

American Diabetes Association apologises for ejecting scientists over editorial criticism

Florida Man Sues Police Over Wrongful Arrest Driven by Faulty Facial Recognition