Cybersecurity Researchers Criticize Anthropic's Fable Guardrails

The Limitations of Fable

Anthropic released its latest model Fable on Tuesday, billing it as a public and limited version of its powerful and much-hyped cybersecurity model Mythos. However, not everyone is happy with the restrictions, and a number of cybersecurity researchers and professionals have aired complaints online.

The Guardrails Controversy

“[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post,” said Valentina “Chompie” Palmiotti, a well-known security researcher who works at IBM X-Force. When a prompt triggers its guardrails, Fable pauses the chat and says that its “safety measures flagged this message for cybersecurity or biology topics.”

The Data Analysis

The guardrails were put in place to limit the risk that Fable could be used to develop malware or compromise software.
The restrictions on biology come from a similar concern around developing biological weapons.

The Impact Analysis

Despite the good intentions, many cybersecurity experts are still put off by the haphazard nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch that “if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.” Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail.

The Prediction

“It seems to be keyword based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails,” said Suiche. “But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies.”