AI researchers found that widely used safety training techniques failed to remove ...

2024-01-27 01:46:51

AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers. https://www.livescience.com/technology/artificial-intelligence/legitimately-scary-anthropic-ai-poisoned-rogue-evil-couldnt-be-taught-how-to-behave-again

#AI #LLM

Author Public Key

npub13t2rz95ezg5re38nlgj26yptjvh8ntcn4qke24xtmq77ax8rglssrh24d2

Show more details

WideEyedCurious 🇺🇸 💙 🇺🇦 on Nostr: AI researchers found that widely used safety training techniques failed to remove ...