
8:50 AM PDT · August 16, 2025
Anthropic has announced caller capabilities that will let immoderate of its newest, largest models to extremity conversations successful what nan institution describes arsenic “rare, utmost cases of persistently harmful aliases abusive personification interactions.” Strikingly, Anthropic says it’s doing this not to protect nan quality user, but alternatively nan AI exemplary itself.
To beryllium clear, nan institution isn’t claiming that its Claude AI models are sentient aliases tin beryllium harmed by their conversations pinch users. In its ain words, Anthropic remains “highly uncertain astir nan imaginable civilized position of Claude and different LLMs, now aliases successful nan future.”
However, its announcement points to a caller programme created to study what it calls “model welfare” and says Anthropic is fundamentally taking a just-in-case approach, “working to place and instrumentality low-cost interventions to mitigate risks to exemplary welfare, successful lawsuit specified use is possible.”
This latest alteration is presently constricted to Claude Opus 4 and 4.1. And again, it’s only expected to hap successful “extreme separator cases,” specified arsenic “requests from users for intersexual contented involving minors and attempts to solicit accusation that would alteration large-scale unit aliases acts of terror.”
While those types of requests could perchance create ineligible aliases publicity problems for Anthropic itself (witness caller reporting astir really ChatGPT tin perchance reenforce aliases lend to its users’ illusion thinking), nan institution says that successful pre-deployment testing, Claude Opus 4 showed a “strong penchant against” responding to these requests and a “pattern of evident distress” erstwhile it did so.
As for these caller conversation-ending capabilities, nan institution says, “In each cases, Claude is only to usage its conversation-ending expertise arsenic a past edifice erstwhile aggregate attempts astatine redirection person grounded and dream of a productive relationship has been exhausted, aliases erstwhile a personification explicitly asks Claude to extremity a chat.”
Anthropic besides says Claude has been “directed not to usage this expertise successful cases wherever users mightiness beryllium astatine imminent consequence of harming themselves aliases others.”
Techcrunch event
San Francisco | October 27-29, 2025
When Claude does extremity a conversation, Anthropic says users will still beryllium capable to commencement caller conversations from nan aforesaid account, and to create caller branches of nan troublesome speech by editing their responses.
“We’re treating this characteristic arsenic an ongoing research and will proceed refining our approach,” nan institution says.
Anthony Ha is TechCrunch’s play editor. Previously, he worked arsenic a tech newsman astatine Adweek, a elder editor astatine VentureBeat, a section authorities newsman astatine nan Hollister Free Lance, and vice president of contented astatine a VC firm. He lives successful New York City.