Anthropic Evaluates AI ‘Model Welfare’ Safeguards

Claude Models May Terminate Unsafe Conversations in Certain Scenarios

Rashmi Ramesh (