Tech

Anthropic’s Fable model faces backlash from security researchers over restrictive guardrails

Cybersecurity professionals report that Anthropic’s latest AI model, Fable, is rejecting innocuous requests related to code reviews and technical reading, citing concerns over malware and biological weapon development.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: TechCrunch · original

Artificial Intelligence Research

Related coverage

Explore Artificial Intelligence coverage Explore Research coverage More from the Tech desk

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Experts argue the new public iteration of the Mythos model blocks legitimate software engineering tasks due to overly broad safety filters

Anthropic released Fable on Tuesday as a public-facing, limited version of its powerful Mythos model, which was previously restricted to a select group of organisations via Project Glasswing. The launch has drawn immediate criticism from cybersecurity researchers, who argue that the model’s safety guardrails are excessively strict for legitimate professional work. The restrictions appear to be keyword-based, causing the system to block requests tangentially related to cybersecurity or biology, even when the intent is benign.

Valentina “Chompie” Palmiotti, a security researcher at IBM X-Force, noted that the model rejects any request that could be tangentially cyber-related. She highlighted that even innocuous tasks, such as reading a blog post, are flagged. When these guardrails are triggered, Fable pauses the chat and cites safety measures that have flagged the message for cybersecurity or biology topics, subsequently falling back to the Claude Opus 4.8 model.

Matt Suiche, a cybersecurity veteran at AI security startup Tolmo, explained that the restrictions often misclassify software engineering tasks as security work. Suiche stated that asking the model to write secure code is frequently treated as cybersecurity-related activity rather than standard software engineering best practice, resulting in downgrades or blocks. Other researchers on X reported that simply requesting a code review also triggers the same safety filters.

Anthropic implemented these limits to mitigate the risk of its models being used to develop malware or compromise software, a longstanding concern for the company. Similarly, the restrictions on biology are designed to prevent the creation of biological weapons. When Anthropic initially released Mythos in April, access was limited to a small group, but the company expanded access to hundreds of organisations across 15 countries last week.

Despite the backlash, Suiche described the approach as understandable given the early stages of adapting guardrails for frontier models. He suggested that the company is likely to relax restrictions over time as it collaborates more closely with cybersecurity firms. For professionals requiring fewer limitations, Anthropic offers a separate Cyber Verification Program, mirroring similar initiatives such as OpenAI’s Trusted Access for Cyber.

Anthropic’s Fable model faces backlash from security researchers over restrictive guardrails

More from Tech

Florida lawmaker denies using AI to draft legislation after Claude signature found in draft

Xbox expands gamertag limits to 15 characters in latest Insider test

UK Police AI Rollout Proceeds Despite Audit Revealing Unreliable Predictive Models