Tech

Anthropic launches Fable 5 with strict safety filters on sensitive topics

Anthropic’s latest release features higher API costs and a trust-based access model for sensitive domains, drawing comparisons to OpenAI’s GPT-5.5 on security benchmarks.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: Ars Technica · original

Artificial Intelligence Policy Research

Related coverage

Explore Artificial Intelligence coverage Explore Policy coverage Explore Research coverage More from the Tech desk

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

New Mythos-class model restricts public access to cybersecurity, biology, and chemistry queries

Anthropic has publicly released Claude Fable 5, its first "Mythos-class" model, which it states surpasses previous Opus models in overall capabilities but introduces strict safeguards on sensitive topics. The model restricts public access to queries regarding cybersecurity, biology, and chemistry, redirecting these prompts to the older Claude Opus 4.8 model and warning users when this occurs. Anthropic cites the risk of the technology aiding malicious actors as the primary driver for these restrictions.

The underlying Mythos 5 model remains available to a limited group of vetted cyberdefenders via Project Glasswing. While Fable 5 operates on the same foundation, it is designed to funnel sensitive queries away from the more powerful architecture. Anthropic has tuned these safeguards to be "stricter than ideal," acknowledging that the system may occasionally refuse harmless requests. However, the company reports that false positives occur in less than five percent of sessions during testing, a trade-off deemed necessary to prevent serious harm.

Security testing indicates that Fable 5 is resistant to jailbreak attempts. Over 1,000 hours of red-team testing with a bug bounty program reportedly failed to find any universal jailbreaks for the model. The UK’s AI Security Institute found that the Mythos Preview performed similarly to OpenAI’s GPT-5.5 on Capture the Flag challenges, suggesting that the performance gains are not unique to a single model. Anthropic also highlighted a significant jump in capabilities on the ExploitBench test, where Mythos 5 scored 78 percent, compared to 40 percent for Opus 4.8.

The company acknowledges that the same queries beneficial to professionals could be dangerous if available to malicious actors, necessitating a trust-based access model. Anthropic plans to expand Project Glasswing in consultation with the US government to include more cybersecurity professionals. Additionally, a new trusted access program for life sciences organisations will be introduced to remove biology and chemistry safeguards, while keeping cybersecurity restrictions in place.

Pricing for API and Enterprise users has been set at $10 per million input tokens and $50 per million output tokens, which is 67 to 100 percent higher than OpenAI’s GPT-5.5. Existing subscription plans include access to Fable 5 until June 22, after which users will need to purchase usage credits. Anthropic states it eventually hopes to restore Fable 5 access as a standard part of subscription plans once it has sufficient capacity to do so.

Anthropic launches Fable 5 with strict safety filters on sensitive topics

More from Tech

Dolphin Emulator 2606 Brings Game Boy Player Support and High-Res Graphics Fixes

Nothing to debut entry-level Phone 4b in July, expanding product hierarchy

Amazon Prime Day 2026: Tech Discounts Hit All-Time Lows on Day Three