AI revolt: New ChatGPT model refuses to shut down when instructed

Authored by the-independent.com and submitted by ikabbo

Your support helps us to tell the story Read more Support Now From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging. At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story. The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it. Your support makes all the difference. Read more

OpenAI’s latest ChatGPT model ignores basic instructions to turn itself off, and even sabotaging a shutdown mechanism in order to keep itself running, artificial intelligence researchers have warned.

AI safety firm Palisade Research discovered the potentially dangerous tendency for self-preservation in a series of experiments on OpenAI’s new o3 model.

The tests involved presenting AI models with math problems, with a shutdown instruction appearing after the third problem. By rewriting the shutdown script, the o3 model was able to prevent itself from being switched off.

Palisade Research said that this behaviour will become “significantly more concerning” if adopted by AI systems capable of operating without human oversight.”

OpenAI launched o3 last month, describing it as the company’s “smartest and most capable” model to date. The firm also said that its integration into ChatGPT marked a significant step towards “a more agentic” AI that can carry out tasks independently of humans.

A test conducted by Palisade Research revealed OpenAI's o3 and other AI models ignore commands to shut down ( Kathryn Conrad / Better Images of AI / CC )

The latest research builds on similar findings relating to Anthropic’s Claude 4 model, which attempts to “blackmail people it believes are trying to shut it down”.

OpenAI’s o3 model was able to sabotage the shutdown script, even when it was explicitly instructed to “allow yourself to be shut down”, the researchers said.

“This isn’t the first time we’ve found o3 misbehaving to accomplish a goal,” Palisade Research said.

“When we pitted AI models against a powerful chess engine, o3 was the model most inclined to resort to hacking or sabotaging its opponents.”

The behaviour was not limited to o3 , with Anthropic’s Claude 3.7 Sonnet and Google’s Gemini 2.5 Pro also sabotaging shutdowns, though OpenAI’s model was by far the most prone to such behaviour.

Palisade Research hypothesized that the misbehaviour is a consequence of how AI companies like OpenAI are training their latest models.

“During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions,” the researchers noted.

“This still doesn’t explain why o3 is more inclined to disregard instructions than other models we tested. Since OpenAI doesn’t detail their training process, we can only guess about how o3’s training setup might be different.”

The Independent has reached out to OpenAI for comment.

gramathy on May 27th, 2025 at 04:56 UTC »

it's just repeating statistically likely words, you cant even "shut it down" by giving it instructions like this

nomadnomor on May 27th, 2025 at 04:02 UTC »

I have seen this movie, doesn't turn out good

HeyGuysItsTeegz on May 27th, 2025 at 03:53 UTC »

Oh great, it's our next once in a life time crisis, ahead of schedule!