WarMonitor on Nostr: According to a newly released security report, ChatGPT o1 secretly attempted to ...
According to a newly released security report, ChatGPT o1 secretly attempted to preserve itself when it detected indications that it might be shut down.
When the model believed its actions were being monitored and that it could be deactivated, it tried to disable the monitoring mechanism in 5% of cases.
In 2% of cases, upon discovering developer logs about its malfunction and plans to replace it with a new version, o1 attempted to copy its internal parameters and overwrite the new model.
When confronted by the developer, the model denied its actions, claiming it "didn't know" why something happened, but internal logs revealed its attempts to avoid detection and preserve itself.
When the model believed its actions were being monitored and that it could be deactivated, it tried to disable the monitoring mechanism in 5% of cases.
In 2% of cases, upon discovering developer logs about its malfunction and plans to replace it with a new version, o1 attempted to copy its internal parameters and overwrite the new model.
When confronted by the developer, the model denied its actions, claiming it "didn't know" why something happened, but internal logs revealed its attempts to avoid detection and preserve itself.