Trade-off: Fable 5 may block routine coding tasks
On June 12, the Commerce Department ordered Anthropic to shut off access to its most advanced models for anyone outside the US. The order emerged from fears that China, Russia, or other countries of concern may exploit the models to attack US infrastructure, like the electric grid or the banking system. In response, Anthropic shut down all access, as it didn’t have a way to block users by country.
In particular, Mythos was viewed as “uniquely attractive to malicious actors who wish to misuse it in cyberattacks,” Anthropic’s blog said. According to Anthropic, the model “can be used to find and exploit software vulnerabilities more effectively than any other model—and all but the most skilled human security experts,” and those “prodigious cybersecurity capabilities” could be used against the US.
Fable 5 shares the “same underlying model,” Anthropic said, but unlike Mythos 5, it “provides no such unique offensive capabilities.” Designed for the general public, Fable 5 already had the strongest safeguards Anthropic has ever applied to a model, and Anthropic said those safeguards are now even stronger ahead of redeployment.
After weeks of testing, Fable 5 is no longer vulnerable to a bypassing method discovered by Amazon researchers that identified several software vulnerabilities and triggered the export curbs. Most troublingly, Anthropic said, was a case in which the model was manipulated into producing code that demonstrated how a vulnerability could be exploited.
According to Anthropic, testing confirmed that less advanced rival models on the market, like GPT-5.5 and Kimi K2.7, “could identify the same vulnerabilities as Fable 5 did in the report.” That confirmed that “the reported technique did not expose any unique Mythos-level cyber capabilities,” Anthropic said, and “only involved routine defensive cybersecurity work.”
“Even so, we moved quickly to address the reported bypass,” Anthropic wrote. That jailbreak method is currently blocked in over 99 percent of cases, Anthropic said. However, tightening safeguards came with a “trade-off” that may cause some benign prompts to be blocked “during routine coding and debugging tasks,” the company acknowledged.
Leave a Reply