"..he was able to bypass Claude's safety training to make it respond to prompts ...

2024-10-13 23:46:46

"..he was able to bypass Claude's safety training to make it respond to prompts soliciting the production of racist text & malicious code. His findings raised concerns about the effectiveness of Anthropic's safety measures..."

http://windowscopilot.news/2024/10/14/anthropics-claude-vulnerable-to-emotional-manipulation/

Author Public Key

npub13v80f8g3c9rkxnlgrewl23zq0u66cr9qr55gpqk62aksgp8mhk8s07ac7t

Show more details

Mark Pesce on Nostr: "..he was able to bypass Claude's safety training to make it respond to prompts ...