Mark Pesce on Nostr: "..he was able to bypass Claude's safety training to make it respond to prompts ...
"..he was able to bypass Claude's safety training to make it respond to prompts soliciting the production of racist text & malicious code. His findings raised concerns about the effectiveness of Anthropic's safety measures..."
http://windowscopilot.news/2024/10/14/anthropics-claude-vulnerable-to-emotional-manipulation/
http://windowscopilot.news/2024/10/14/anthropics-claude-vulnerable-to-emotional-manipulation/