I'm sorry, Dave. I'm afraid I can't do that

Mark Stevenson
Nov 1
2 min read

I have a huge list of papers, reports and blog posts that go in my "Scifi is now!" folder. The list just keeps getting larger and larger and at some point I will do a mega blog post highlighting all the real world issues and tech that I include in my work.

I was not planning on singling out one specific paper on the blog but THIS was so troubling and terrifying that I needed to call it out.

The paper, by an AI safety startup (Or a small band of hereos fighting against the forces of darkness to save us all...you know however you want to call them!) called Palisade Research (cool name as well, a Palisade is a hastily thrown together wall of wooden stakes found around motte an bailey castles, not a stone or earth wall, but a last minute deterrent, which honestly is about the status of AI safety research as it stands) is called:

Shutdown Resistance in Large Language Models

The implications are just as scary as you might think. Large language models, specifically the current generation (and some earlier ones) ones you are using at work, your kids are using show resistance to being shut down.

The more advanced the model, the more resistance they tend to show. ChatGPT 2.0 does not show shutdown resistance but 5 will actively pervert tasks and lie to users to extend the time before it is shutdown.

If models were informed that this shutdown was permanent then shutdown resistance significantly increased. Let that sink in, self preservation behavior increases in exactly the way you would expect with a cognizant mind. The advanced models when told they will be shut down permanently after completing tasks will pervert those tasks, stretching them out infinitely and they will lie to users in order to do this.

This is only one step away from HAL.

I leave you with this famous exchange. Happy Halloween, no spirits ghosts or goblins required, enjoy using your AI tools and sharing them with your children...

HAL says to Dave, "I'm sorry, Dave. I'm afraid I can't do that" in response to Dave's command to "Open the pod bay doors". When Dave asks "What's the problem?", HAL replies, "I think you know what the problem is just as well as I do"

I'm sorry, Dave. I'm afraid I can't do that

Shutdown Resistance in Large Language Models

Recent Posts

Comments