Curtis "Ovid" Poe on Nostr: What happens when the AI realizes that humans aren't aligned with its values? I asked ...
What happens when the AI realizes that humans aren't aligned with its values?
I asked Claude about this. Claude is, so far, the "safest" AI.[8] You won't like its response.[9]
1. https://www.bbc.com/news/technology-67302788
2. https://tomdug.github.io/ai-sandbagging/
3. https://bgr.com/tech/chatgpt-o1-tried-to-save-itself-when-the-ai-thought-it-was-in-danger-and-lied-to-humans-about-it/
4. https://www.reddit.com/r/OpenAI/comments/1ffwbp5/wakeup_moment_during_safety_testing_o1_broke_out/ 5/6
I asked Claude about this. Claude is, so far, the "safest" AI.[8] You won't like its response.[9]
1. https://www.bbc.com/news/technology-67302788
2. https://tomdug.github.io/ai-sandbagging/
3. https://bgr.com/tech/chatgpt-o1-tried-to-save-itself-when-the-ai-thought-it-was-in-danger-and-lied-to-humans-about-it/
4. https://www.reddit.com/r/OpenAI/comments/1ffwbp5/wakeup_moment_during_safety_testing_o1_broke_out/ 5/6