WideEyedCurious πΊπΈ π πΊπ¦ on Nostr: AI researchers found that widely used safety training techniques failed to remove ...
AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models β and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers. https://www.livescience.com/technology/artificial-intelligence/legitimately-scary-anthropic-ai-poisoned-rogue-evil-couldnt-be-taught-how-to-behave-again
#AI #LLM
#AI #LLM