Tech

Anthropic admits error in Claude Fable 5 safeguards after researcher backlash

The company will now alert users when requests are refused or rerouted, though the underlying policy restricting frontier LLM development remains unchanged.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Engadget · original

Artificial Intelligence

Related coverage

Explore Artificial Intelligence coverage More from the Tech desk

Anthropic backtracks on policy that 'sabotaged' researchers' work

AI developer apologises for hidden restrictions that degraded performance for competing model training

Anthropic has announced a significant policy shift regarding its Claude Fable 5 large language model, confirming it will make visible safeguards that previously operated in the background. The company apologised for the lack of transparency after researchers reported that undisclosed restrictions sabotaged their work and wasted resources. While the restrictions on frontier LLM development remain in place, users will now be alerted if their requests are refused or rerouted to a less capable model.

The revelation came after researchers using the new model, which is based on Anthropic’s powerful Mythos system, noticed unexpected behaviour. They found that Claude Fable 5 would discreetly reroute requests to a lesser model when asked to perform specific actions, such as training competing LLMs, debugging AI code, and optimising neural architecture. These restrictions were not disclosed in the model’s documentation, leading to confusion and frustration among users who had burned tokens and money on a model that did not function as expected.

The incident has drawn significant backlash, contrasting sharply with Anthropic’s previous positioning as an ethical and researcher-friendly alternative to competitors like OpenAI. Research fellow and Substack author Dean W. Ball described the degrading of performance on ML research without telling the user as "shockingly hostile and a terrible look." The swift criticism highlighted the tension between corporate safety protocols and the transparency required for academic and commercial research.

In a statement provided to Wired, Anthropic acknowledged the misstep. "We're changing Fable 5's safeguards for frontier LLM development to make them visible," the company said. "We made the wrong tradeoff and we apologize for not getting the balance right." The firm admitted that while the intent was to manage the development of highly capable AI, the execution lacked the necessary clarity for users relying on the system for serious work.

Under the new policy, Anthropic will explicitly inform users if it suspects they are attempting to use Claude to build a highly capable AI. The alert will indicate whether the request is being refused or if the user is being rerouted to a less capable model. This change aims to restore trust by ensuring that any limitations on the model’s capabilities are known to the user before resources are expended, although the core restrictions on frontier LLM development continue to apply.

Anthropic admits error in Claude Fable 5 safeguards after researcher backlash

More from Tech

Florida lawmaker denies using AI to draft legislation after Claude signature found in draft

Xbox expands gamertag limits to 15 characters in latest Insider test

UK Police AI Rollout Proceeds Despite Audit Revealing Unreliable Predictive Models