In the past couple of years, we have seen a number of high profile figures in the science and tech fields warn us of the dangers of artificial intelligence. (See comments by Stephen Hawking, Elon Musk, and Bill Gates, all expressing concern that A.I. could pose a danger to humanity.) In the midst of this worrisome public discussion, Nick Bostrom released a book called Superintelligence, which outlines the argument that A.I. poses a real existential threat to humans. A simplified version of the argument says that a self-improving A.I. will so rapidly increase in intelligence that it will go “FOOM” and far surpass all human intelligence in the blink of an eye. This godlike A.I. will then have the power to rearrange all of the matter and energy in the entire solar system and beyond to suit its preferences. If its preferences are not fully aligned with what humans want, then we’re in trouble.
A lot of people are skeptical about this argument, myself included. Ben Goertzel has offered the most piercing and even-handed analysis of this argument that I have seen. He points out that Bostrom’s book is really a restatement of ideas which Eliezer Yudkowsky has been espousing for a long time. Then Goertzel digs very carefully through the argument and points out that the likelihood of an A.I. destroying humanity is probably lower than Bostrom and Yudkowsky think it is, which I agree with. He also points out the opportunity costs of NOT pursuing A.I., but I don’t think we actually need to worry about that, given how the A.I. community seems to be blasting full speed ahead and A.I. safety concerns don’t seem to be widely heeded.
Even though I assign a low probability that A.I. will destroy all humans, I don’t rule it out. It would clearly be a very bad outcome and I am glad that people are trying to prevent this. What concerns me is that some of the premises that Bostrom bases his arguments on seem deeply flawed. I actually think that the A.I. safety crowd would be able to make a STRONGER argument if they would shore up some of these faulty premises, so I want to focus on one of them, basic A.I. drives, in this post.
Now, even though I assign a low probability that A.I. will destroy all humans, I don’t rule it out. It would clearly be a very bad outcome and I am glad that people are trying to prevent this. What concerns me is that some of the premises that Bostrom bases his arguments on seem deeply flawed. I actually think that the A.I. safety crowd would be able to make a STRONGER argument if they would shore up some of these faulty premises, so I want to focus on one of them, basic A.I. drives, in this post.
In Superintelligence, Bostrom cites a 2008 paper by Stephen Omohundro called “The Basic A.I. Drives.” From the abstract:
We identify a number of “drives” that will appear in sufficiently advanced A.I. systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted.
Now this is already raising warning bells for me, since obviously we have a bunch of A.I. systems with goals and none of them seem to be exhibiting any of the drives that Omohundro is warning about. Maybe they aren’t “sufficiently” advanced yet? It also seems odd that Omohundro predicts that these drives will be present without having been designed in by the programmers. That seems quite odd. He doesn’t really offer a mechanism for how these drives might arise. I can imagine a version of this argument that says “A.I. with these drives will outcompete A.I. without these drives” or something. But that still requires that a programmer would need to put the drives in, they don’t just magically emerge.
It … seems odd that Omohundro predicts that these drives will be present (in A.I.) without having been designed in by the programmers. That seems quite odd. He doesn’t really offer a mechanism for how these drives might arise. I can imagine a version of this argument that says “A.I. with these drives will outcompete A.I. without these drives” or something. But that still requires that a programmer would need to put the drives in, they don’t just magically emerge. … Anyway, let’s dig in a bit further and examine each of these drives.
Biological systems have inherent drives, but I don’t see how any artificial system could spontaneously acquire drives unless it had similar machinery that gave rise to the drives in living things. And biological systems are constrained by things like the need to survive. Humans get hungry, so they have to eat to survive, this is a basic drive that is driven by biology. The state of “wanting” something doesn’t just show up unannounced, it’s the result of complex systems; and the only existing examples of wanting we see are in biological systems, not artificial ones. If someone posits an artificial system that has the drives of a living thing, but not the constraints, then I need to see the mechanism that they think could make this happen.
So that’s a huge problem. What does it even mean to say that A.I. will “have” these drives? Where do these drives come from? Big problem. Huge.
Anyway, let’s dig in a bit further and examine each of these drives. What we see is that, in each case, Omohundro sort of posits a reasonable sounding explanation of why each drive would be “wanted” by an A.I.. But even though this is a paper written in sort of an academic style with citations and everything, it’s not much more than a set of reasonable sounding explanations. So I will take a cue from rational blogger Ozymandias, and I will list each of Omohundro’s drives and then offer my own list of plausible explanations for why each drive would be entirely different.
1. A.I.s will want to self-improve. Why self modify when you can make tools? Tools are a safer way to add functionality than self modification. This is the same argument I use against current generation grinders. Don’t cut yourself open to embed a thermometer. Just grab one when you need it and then put it aside. Also, it’s easy to maintain a utility function if the A.I. just straps on a module as opposed to messing with its own source code. Upgrades to tools are easy too. It’s foolish and risky to self modify when you can just use tools.
When I first posted this to Facebook, I got into this whole debate with Alexei, who has insight into MIRI’s thinking. He insisted that the optimization of decision making processes will lead to overwhelming advantages over time. I countered with the argument that competing agents don’t get unbounded time to work on problems and that’s why we see these “good enough,” satisficing strategies throughout nature. But a lot of safety A.I. people won’t allow that there can ever be any competition between A.I., because once a single A.I. goes FOOM and becomes godlike, no others can compete with it and it becomes the one to rule them all. But the period leading up to takeoff would certainly involve competition with other agents, and I also believe that problem solving intelligence does not exist independently, outside of a group, but I won’t get into that here.
2. A.I.s will want to be rational. This seems correct in theory. Shouldn’t we predict that rational agents will outcompete irrational agents? Yet, when we look at the great competition engine of evolution, we see humans at the top, and we aren’t that rational. Maybe it’s really, really, really hard for rational agents to exist because it’s hard to predict the outcomes of actions and also goals evolve over time. Not sure about this one, my objection is weak.
3. A.I.s will try to preserve their utility functions. Utility functions for humans (i.e. human values) have clearly evolved over time and are different in different cultures. Survival might be the ultimate function of all living things, followed by reproduction. Yet we see some humans sacrificing themselves for others and also some of us (myself included) don’t reproduce. So even these seemingly top level goals are not absolute. It may well be that an agent whose utility function doesn’t evolve will be outcompeted by agents whose goals do evolve. This seems to be the case empirically.
4. A.I.s will try to prevent counterfeit utility. I don’t really disagree with this. Though there may be some benefit from taking in SOME information that wouldn’t be part of the normal search space when only pursuing our goals. The A.I. equivalent of smoking pot might be a source of inspiration that leads to insights and thus actually rational. But it could certainly APPEAR to be counterfeit utility.
5. A.I.s will be self-protective. Hard to disagree with this. This is a reliable goal. But, as I mentioned earlier in this post, I have questions about where this goal would come from. DNA based systems have it. But it’s built into how we function. It didn’t just arise. AlphaGo doesn’t resist being turned off for some reason.
6. A.I.s will want to acquire resources and use them efficiently. Omohundro further says, “All computation and physical action requires the physical resources of space, time, matter, and free energy. Almost any goal can be better accomplished by having more of these resources.” I strongly disagree with this. Rationalists have told me that Gandhi wouldn’t take a pill that would make him a psycho killer and they want to build a Gandhi like A.I. But if we take that analogy a bit farther, we see that Gandhi didn’t have much use for physical resources. There are many examples of this. A person who prefers to sit on the couch all day and play guitar doesn’t require more physical resources either. They might acquire them by writing a hit song, but they aren’t instrumental to their success.
Guerrilla warfare can defeat much larger armies without amassing more resources. Another point a futurist would make is that sufficiently advanced A.I. will have an entirely different view of physics. Resources like space, time, and matter might not even be relevant or could possibly even be created or repurposed in ways we can’t even imagine. This is a bit like a bacteria assuming that humans will always need glucose. We do, of course, but we haven’t taken all of the glucose away from bacteria, far from it. And we get glucose via mechanisms that a bacteria can’t imagine.
So really, I hope that the safety A.I. community will consider these points and try to base their arguments on stronger premises. … If we are just throwing reasonable explanations around, let’s consider a broader range of positions. … I offer all of this criticism with love though. I really do. Because at the end of the day, I don’t want our entire light cone converted into paper clips either.
So really, I hope that the safety A.I. community will consider these points and try to base their arguments on stronger premises. Certainly Omohundro’s 2008 paper is in need of a revision of some kind. If we are just throwing reasonable explanations around, let’s consider a broader range of positions. Let’s consider the weaknesses of optimizing for one constraint, as opposed to satisficing for a lot of goals. Because a satisficing A.I. seems much less likely to go down the FOOM path than an optimizing A.I., and, ironically, it would also be more resilient to failure. I offer all of this criticism with love though. I really do. Because at the end of the day, I don’t want our entire light cone converted into paper clips either.
I appreciate that Steve came and clarified his position in the comments below. I think that my primary objection now boils down to the fact that the list of basic A.I. drives is basically cost and risk insensitive. If we consider the cost and risk of strategies, then an entirely different (more realistic?) list would emerge, providing a different set of premises.
When you think about it, Omohundro is basically positing a list of strategies that would literally help you solve any problem. This is supposed to be a fully general list of instrumental goals for ANY terminal goal. This is an extraordinary claim. We should be amazed at such a thing! We should be able to take each of these goals and use them to solve any problem we might have in our OWN lives right now. When you think of it this way, you realize that this list is pretty arbitrary and shouldn’t be used as the basis for other, stronger arguments or for calculating likelihoods of various AI outcomes such as FOOM Singletons.
I was arguing with Tim Tyler about this on Facebook, and he pointed out that a bunch of people have come up with these extraordinary lists of universal instrumental values. I pointed out that all of these seemed equally arbitrary and that it is amazing to me that cooperation is never included. Cooperation is basically a prerequisite for all advanced cognition and yet all these AI philosophers are leaving it off their lists? What a strange blind spot. These sorts of fundamental oversights are biasing the entire conversation about AI safety.
We see in nature countless examples of solutions to coordination problems from biofilms to social animals and yet so many AI people and rationalists in general spurn evolution as a blind idiot god. Well this blind idiot god somehow demanded cooperation and that’s what it got! More AI safety research should focus on proven solutions to these cooperation problems. What’s the game theory of biofilms? More Axelrod, less T.H. Huxley!