AI Safety Via Debate

Mark Stevenson
Jul 17, 2025
1 min read

Interested in some of the foundation work on AI safety? No, you probably should be. As AI's become smarter and more competent than humans we will struggle to evaluate their work and keep them safe and aligned with our goals and values.

One possible solution is debate, we can pit two competing AI's against each other and then allow human judges (or a human jury) to rule on which one is correct. This is an interesting approach with many possible benefits but also risks.

Check out the video series here

This paper by Open AI is really the foundation of this approach. I go through and discuss some sections of it in the videos.

Bob and Alice are two AI's debating where you should go on holiday. Alice is team Alaska and Bob is team Bali. Who has the best arguments?

AI Safety Via Debate

Recent Posts

Comments