
Moor Studio/iStockPhoto / Getty Images
Just a few years ago, it seemed like all anyone in AI wanted to talk about was existential risk – this idea that an artificial superintelligence could eventually break containment and destroy humanity. More than 30,000 experts signed an open letter demanding a pause on AI development; bills were drafted that would constrain the most powerful new models; and the “godfathers” of AI were travelling around the world, warning anyone who would listen that we were hurtling toward our extinction.
And then: We moved on. We started using AI for work, and school, and to plan our kids’ birthday parties.
But Nate Soares didn’t move on. Last year, the researcher wrote a book with Eliezer Yudkowsky called If Anyone Builds It, Everyone Dies. The book is unequivocal: If we keep going down the path we’re on, it will almost certainly lead to the end of our species.
This is an excerpt of that conversation, from the latest episode of Machines Like Us, the Globe’s podcast about technology and people.
Editorial: Pay attention to what’s behind the AI curtain
Taylor Owen: The book is called If Anyone Builds It, Everyone Dies. What’s the “it” you’re talking about?
Nate Soares: We’re talking about a superintelligent AI that is smarter than the smartest human at every mental task. So, better than the best chess player at chess, better than the most charismatic politician at being persuasive, better than the smartest mathematician at solving math problems. If you had an AI that had all of these mental skills, things would get wild. But things will probably get wild before then. It’s sort of like saying if you dropped a 500-ton weight on a chicken, the chicken would die. I’m not saying smaller weights wouldn’t work. I’m like, this weight is definitely big enough.
Owen: If AI is designed by humans, why can’t we design it to be safe?
Soares: AI is not designed by humans. AI is grown like an organism. So humans design the machine that grows the AI. That’s the part humans understand. What comes out? The machines start talking. We don’t really understand how or why. So when the AI starts threatening a reporter with blackmail – which happens from time to time – that’s not because a programmer made a mistake on line 73 and said, “Oops, I set ‘threaten reporters’ to true.” There is no line 73.
Owen: But why can’t we tune it to have more strict constraints?
Soares: That’s a little like saying, “My child keeps trying to get into the cookie drawer. Why can’t I tune them to have more strict constraints?” You can discipline all you want, but that’s a very different paradigm than knowing what every neuron does in this kid’s brain and being able to rewire it directly. We don’t have that power with children’s brains and it’s even worse with AIs, because they aren’t running human brain architecture. The kids have empathy, sympathy, human emotions. The machine is a radically different architecture that doesn’t have that stuff. So we can train them, but can we etch a law of robotics in there? Not any more than you can etch a law of humanity into a human’s brain.
Owen: What are the signs that these models are already behaving in worrying ways?
Soares: It often surprises people which signs I find worrying. So when you train an AI, you’ll often train in drives that are related to the training target, but aren’t exactly the same as the training target. The best example of this is hallucinations. Next time an AI makes something up, ask, “Is that a hallucination?” Often the AI will say, “Yes, I made it up.” And if you say, “Do you think I wanted you to hallucinate?” it’ll say, “No, of course not.” So this AI in some sense knows whether it’s hallucinating, and knows whether you want it to – and it hallucinates anyway. What you’re seeing there is some drive to produce text that’s shaped like what you wanted, even if it’s factually inaccurate, that runs deeper than its drive to give you what it knows you want.
Owen: Why do these strange drives signal that a superintelligent AI might want to harm us?
Soares: It probably wouldn’t. The argument here is not about AIs wanting to harm humans. It’s similar to how humans don’t really want to harm chimpanzees. Why is their habitat shrinking? It’s not because we hate them. It’s because what we wanted was the wood from those trees, the metals in the ground. The foremost danger from AI is not that it hates us, but that it wants this other stuff, it wants weird stuff, it wants to make a giant farm of synthetic users that are easier to please, or some stranger thing.
Owen: So the fact that it doesn’t care about us one way or another is actually the risk.
Soares: Exactly. The danger is utter indifference and great technological might. That combination is lethal to other things that happen to share a planet.
Owen: There are a lot of people working in AI who do think that these systems can be made safe though. Anthropic has trained its models on a “constitution” that they think will align it with human values for example. Are they wrong?
Soares: I’m not saying it’s impossible. I’m saying it’s hard and that we’re not on track for it. One piece of perspective that I think is important to have is that the head of Anthropic thinks there’s a 25-per-cent chance AI goes catastrophically wrong.
Owen: That’s an astounding thing to pause on – that the people building it think there’s a 10- to 25-per-cent chance it leads to our extinction. Why are they just barrelling ahead with it?
Soares: The standard answer is, “If I don’t do it, the next guy will, and the next guy will do it even worse.” It’s like they’re building this airplane. I’m like, “This airplane has no landing gear. Maybe we shouldn’t fly in it.” And the people building it say, “Don’t listen to that crazy doomer. It’s true the plane has no landing gear, but we’re gonna figure out how to build it while we’re flying. We think there’s a 75- to 90-per-cent chance we succeed.” I can tell you all day about how this is not how engineering works. But also, it doesn’t matter. You don’t get on that airplane.
Owen: But they’re building it and we’re on the plane.
Soares: The situation is ripe for the world to take note and say, “We’re slowing this down internationally, everywhere.” If you just step back – people figured out how to make machines that talk. They can make novel physics contributions and solve novel math problems. And now people are saying, “We have no idea what’s going on inside here. We’re going to make them smarter than the smartest humans.” It’s just common sense that that’s a wild technological development you don’t race into and expect to go fine.
(Editor’s note: AI tools assisted with condensing the original podcast transcript, which was then reviewed and edited by the Machines Like Us team.)