In the past couple of years, we have seen a number of high profile figures in the science and tech fields warn us of the dangers of artificial intelligence. (See comments by Stephen Hawking, Elon Musk, and Bill Gates, all expressing concern that A.I. could pose a danger to humanity.) In the midst of this worrisome public discussion, Nick Bostrom released a book called Superintelligence, which outlines the argument that A.I. poses a real existential threat to humans. A simplified version of the argument says that a self-improving A.I. will so rapidly increase in intelligence that it will go “FOOM” and far surpass all human intelligence in the blink of an eye. This godlike A.I. will then have the power to rearrange all of the matter and energy in the entire solar system and beyond to suit its preferences. If its preferences are not fully aligned with what humans want, then we’re in trouble.
A lot of people are skeptical about this argument, myself included. Ben Goertzel has offered the most piercing and even-handed analysis of this argument that I have seen. He points out that Bostrom’s book is really a restatement of ideas which Eliezer Yudkowsky has been espousing for a long time. Then Goertzel digs very carefully through the argument and points out that the likelihood of an A.I. destroying humanity is probably lower than Bostrom and Yudkowsky think it is, which I agree with. He also points out the opportunity costs of NOT pursuing A.I., but I don’t think we actually need to worry about that, given how the A.I. community seems to be blasting full speed ahead and A.I. safety concerns don’t seem to be widely heeded.
Even though I assign a low probability that A.I. will destroy all humans, I don’t rule it out. It would clearly be a very bad outcome and I am glad that people are trying to prevent this. What concerns me is that some of the premises that Bostrom bases his arguments on seem deeply flawed. I actually think that the A.I. safety crowd would be able to make a STRONGER argument if they would shore up some of these faulty premises, so I want to focus on one of them, basic A.I. drives, in this post.
Now, even though I assign a low probability that A.I. will destroy all humans, I don’t rule it out. It would clearly be a very bad outcome and I am glad that people are trying to prevent this. What concerns me is that some of the premises that Bostrom bases his arguments on seem deeply flawed. I actually think that the A.I. safety crowd would be able to make a STRONGER argument if they would shore up some of these faulty premises, so I want to focus on one of them, basic A.I. drives, in this post.
In Superintelligence, Bostrom cites a 2008 paper by Stephen Omohundro called “The Basic A.I. Drives.” From the abstract:
We identify a number of “drives” that will appear in sufficiently advanced A.I. systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted.
Now this is already raising warning bells for me, since obviously we have a bunch of A.I. systems with goals and none of them seem to be exhibiting any of the drives that Omohundro is warning about. Maybe they aren’t “sufficiently” advanced yet? It also seems odd that Omohundro predicts that these drives will be present without having been designed in by the programmers. That seems quite odd. He doesn’t really offer a mechanism for how these drives might arise. I can imagine a version of this argument that says “A.I. with these drives will outcompete A.I. without these drives” or something. But that still requires that a programmer would need to put the drives in, they don’t just magically emerge.
It … seems odd that Omohundro predicts that these drives will be present (in A.I.) without having been designed in by the programmers. That seems quite odd. He doesn’t really offer a mechanism for how these drives might arise. I can imagine a version of this argument that says “A.I. with these drives will outcompete A.I. without these drives” or something. But that still requires that a programmer would need to put the drives in, they don’t just magically emerge. … Anyway, let’s dig in a bit further and examine each of these drives.
Biological systems have inherent drives, but I don’t see how any artificial system could spontaneously acquire drives unless it had similar machinery that gave rise to the drives in living things. And biological systems are constrained by things like the need to survive. Humans get hungry, so they have to eat to survive, this is a basic drive that is driven by biology. The state of “wanting” something doesn’t just show up unannounced, it’s the result of complex systems; and the only existing examples of wanting we see are in biological systems, not artificial ones. If someone posits an artificial system that has the drives of a living thing, but not the constraints, then I need to see the mechanism that they think could make this happen.
So that’s a huge problem. What does it even mean to say that A.I. will “have” these drives? Where do these drives come from? Big problem. Huge.
Anyway, let’s dig in a bit further and examine each of these drives. What we see is that, in each case, Omohundro sort of posits a reasonable sounding explanation of why each drive would be “wanted” by an A.I.. But even though this is a paper written in sort of an academic style with citations and everything, it’s not much more than a set of reasonable sounding explanations. So I will take a cue from rational blogger Ozymandias, and I will list each of Omohundro’s drives and then offer my own list of plausible explanations for why each drive would be entirely different.
1. A.I.s will want to self-improve. Why self modify when you can make tools? Tools are a safer way to add functionality than self modification. This is the same argument I use against current generation grinders. Don’t cut yourself open to embed a thermometer. Just grab one when you need it and then put it aside. Also, it’s easy to maintain a utility function if the A.I. just straps on a module as opposed to messing with its own source code. Upgrades to tools are easy too. It’s foolish and risky to self modify when you can just use tools.
When I first posted this to Facebook, I got into this whole debate with Alexei, who has insight into MIRI’s thinking. He insisted that the optimization of decision making processes will lead to overwhelming advantages over time. I countered with the argument that competing agents don’t get unbounded time to work on problems and that’s why we see these “good enough,” satisficing strategies throughout nature. But a lot of safety A.I. people won’t allow that there can ever be any competition between A.I., because once a single A.I. goes FOOM and becomes godlike, no others can compete with it and it becomes the one to rule them all. But the period leading up to takeoff would certainly involve competition with other agents, and I also believe that problem solving intelligence does not exist independently, outside of a group, but I won’t get into that here.
2. A.I.s will want to be rational. This seems correct in theory. Shouldn’t we predict that rational agents will outcompete irrational agents? Yet, when we look at the great competition engine of evolution, we see humans at the top, and we aren’t that rational. Maybe it’s really, really, really hard for rational agents to exist because it’s hard to predict the outcomes of actions and also goals evolve over time. Not sure about this one, my objection is weak.
3. A.I.s will try to preserve their utility functions. Utility functions for humans (i.e. human values) have clearly evolved over time and are different in different cultures. Survival might be the ultimate function of all living things, followed by reproduction. Yet we see some humans sacrificing themselves for others and also some of us (myself included) don’t reproduce. So even these seemingly top level goals are not absolute. It may well be that an agent whose utility function doesn’t evolve will be outcompeted by agents whose goals do evolve. This seems to be the case empirically.
4. A.I.s will try to prevent counterfeit utility. I don’t really disagree with this. Though there may be some benefit from taking in SOME information that wouldn’t be part of the normal search space when only pursuing our goals. The A.I. equivalent of smoking pot might be a source of inspiration that leads to insights and thus actually rational. But it could certainly APPEAR to be counterfeit utility.
5. A.I.s will be self-protective. Hard to disagree with this. This is a reliable goal. But, as I mentioned earlier in this post, I have questions about where this goal would come from. DNA based systems have it. But it’s built into how we function. It didn’t just arise. AlphaGo doesn’t resist being turned off for some reason.
6. A.I.s will want to acquire resources and use them efficiently. Omohundro further says, “All computation and physical action requires the physical resources of space, time, matter, and free energy. Almost any goal can be better accomplished by having more of these resources.” I strongly disagree with this. Rationalists have told me that Gandhi wouldn’t take a pill that would make him a psycho killer and they want to build a Gandhi like A.I. But if we take that analogy a bit farther, we see that Gandhi didn’t have much use for physical resources. There are many examples of this. A person who prefers to sit on the couch all day and play guitar doesn’t require more physical resources either. They might acquire them by writing a hit song, but they aren’t instrumental to their success.
Guerrilla warfare can defeat much larger armies without amassing more resources. Another point a futurist would make is that sufficiently advanced A.I. will have an entirely different view of physics. Resources like space, time, and matter might not even be relevant or could possibly even be created or repurposed in ways we can’t even imagine. This is a bit like a bacteria assuming that humans will always need glucose. We do, of course, but we haven’t taken all of the glucose away from bacteria, far from it. And we get glucose via mechanisms that a bacteria can’t imagine.
So really, I hope that the safety A.I. community will consider these points and try to base their arguments on stronger premises. … If we are just throwing reasonable explanations around, let’s consider a broader range of positions. … I offer all of this criticism with love though. I really do. Because at the end of the day, I don’t want our entire light cone converted into paper clips either.
So really, I hope that the safety A.I. community will consider these points and try to base their arguments on stronger premises. Certainly Omohundro’s 2008 paper is in need of a revision of some kind. If we are just throwing reasonable explanations around, let’s consider a broader range of positions. Let’s consider the weaknesses of optimizing for one constraint, as opposed to satisficing for a lot of goals. Because a satisficing A.I. seems much less likely to go down the FOOM path than an optimizing A.I., and, ironically, it would also be more resilient to failure. I offer all of this criticism with love though. I really do. Because at the end of the day, I don’t want our entire light cone converted into paper clips either.
I appreciate that Steve came and clarified his position in the comments below. I think that my primary objection now boils down to the fact that the list of basic A.I. drives is basically cost and risk insensitive. If we consider the cost and risk of strategies, then an entirely different (more realistic?) list would emerge, providing a different set of premises.
When you think about it, Omohundro is basically positing a list of strategies that would literally help you solve any problem. This is supposed to be a fully general list of instrumental goals for ANY terminal goal. This is an extraordinary claim. We should be amazed at such a thing! We should be able to take each of these goals and use them to solve any problem we might have in our OWN lives right now. When you think of it this way, you realize that this list is pretty arbitrary and shouldn’t be used as the basis for other, stronger arguments or for calculating likelihoods of various AI outcomes such as FOOM Singletons.
I was arguing with Tim Tyler about this on Facebook, and he pointed out that a bunch of people have come up with these extraordinary lists of universal instrumental values. I pointed out that all of these seemed equally arbitrary and that it is amazing to me that cooperation is never included. Cooperation is basically a prerequisite for all advanced cognition and yet all these AI philosophers are leaving it off their lists? What a strange blind spot. These sorts of fundamental oversights are biasing the entire conversation about AI safety.
We see in nature countless examples of solutions to coordination problems from biofilms to social animals and yet so many AI people and rationalists in general spurn evolution as a blind idiot god. Well this blind idiot god somehow demanded cooperation and that’s what it got! More AI safety research should focus on proven solutions to these cooperation problems. What’s the game theory of biofilms? More Axelrod, less T.H. Huxley!
Suppose you have an AI whose algorithm involves generating strategies and calculating, for each strategy, how well it will achieve some goal, then doing whatever best achieves its main goal. I think Omohundro’s point is that, if the strategies it’s able to generate are broad enough, this alone is sufficient to make an AGI exhibit the AI drives, because they’re effective strategies for almost any goal. Bounding the time that AGIs have to work on problems might prevent them from doing things like making successor-AIs, but not necessarily; it depends on where the bound is set and on how much gain there is from making one.
Some more specific points:
> Why self modify when you can make tools?
In the context of an AGI, “self modification” sort of blurs into “creating additional copies that are slightly different”. In particular, if there’s an AGI code-base, then just as humans would perceive that code-base as a tool, so too would the AGI. The human-scale analogy then becomes: why use other humans when you can use ordinary tools? But we observe that sometimes it’s best to use ordinary tools, and sometimes it’s best to recruit other humans, and sometimes it’s best to modify (train) those humans.
> Yet, when we look at the great competition engine of evolution, we see humans at the top, and we aren’t that rational.
There’s an absolute scale that we aren’t very far down, but compared to, say, pigeons? We notice the ways in which humans are irrational because we pay more attention, but I’m pretty sure the humans are more rational overall.
> Survival might be the ultimate function of all living things, followed by reproduction. Yet we see some humans sacrificing themselves for others and also some of us (myself included) don’t reproduce. So even these seemingly top level goals are not absolute.
Evolution’s goal, yes. What this shows is that evolution did not succeed at copying its goal system into humans. Which is good, because evolution’s goal system is pretty lame.
> AlphaGo doesn’t resist being turned off for some reason.
It’s because AlphaGo isn’t able to think about anything except Go positions. There isn’t any Go position or move that corresponds to its power switch being flipped, so it can’t think about that.
> But if we take that analogy a bit farther, we see that Gandhi didn’t have much use for physical resources.
We may have differing interpretations of what it means to use resources. Gandhi led tens of thousands of people (https://en.wikipedia.org/wiki/Salt_March); in a certain sense, those people are resources, imperfectly controlled.
> Because a satisficing A.I. seems much less likely to go down the FOOM path than an optimizing A.I., and, ironically, it would also be more resilient to failure.
Yes! My understanding is that transforming the intuition of satisficing into good mathematical models turns out to be tricky, but this is one of the promising paths to making safe AIs.
Thanks James, I think you make some excellent points. I have a few comments about this:
This idea of the ability to generate a broad enough strategy IS missing from Omohundro’s paper. I wonder what that means exactly though. I am predicting that embodied coupling with the environment is what both facilitates the generation of broader strategies as well as innately constrains agents from going FOOM.
1. Self-modification: If this line get blurred regarding using a tool or developing a new AGI, it still seems to break the recursive self-improvement process. At some point wouldn’t the Singleton need to “Kill” versions of itself to maintain supremacy. It would seem that it’s competing and cooperating with agents of it’s own creation with comparable capabilities in many cases.
2. Rationality – Yeah, humans are more rational than other animals except maybe insects who have more/comparable biomass? If an AI could generate such a broad set of strategies that it include “be more rational,” what does a strategy to be more rational look like per se? Does it look like the Sequences? Goal factoring and other CFAR tools?
3. Evolution – Why do rationalists disparage evolution all the time? I actually think it’s because a lot of rationalists are optimizing hedgehogs (specialists) instead of satisficing foxes (generalists) like myself. Evolution is a multi-variant satisficing process. Over-optimizing for any single trait results in tradeoffs for survival. I predict that agents that optimize for a single goal or trait will be under optimized in a broad range of non-related traits that will make them vulnerable to attack.
4. Self-preservation – Yes, I see. But I don’t see any proposed mechanisms for generating strategies that would include the power switch that the developers don’t explicitly include. Embodiment might be an answer to this.
5. Need for resources – Yes, I concede the point that Gandhi himself might have literally amassed more resources. But there are in fact military strategies such as those demonstrated in guerilla warfare which are highly competitive against foes with larger total resources. More resources are not strictly more competitive. Not to mention the broad range of goals orthogonal to resources like art and humor.
6. Satisficing – I think you just need to get some mathematicians that are themselves satisficers maybe. As I said, most rationalists I meet are optimizing hedgehogs. Bless them.
Hi Scott, Steve Omohundro here. First let me say that I’m glad you are thinking about these issues because I think they are very important. I’m fascinated that my AI Drives work generates so much controversy. They are really not about AI but about unintended consequences of taking the best actions to accomplish a goal. I wrote this post to try to capture the essence of each drive in a single sentence:
Some people have found that description easier to understand.
Let me briefly address some of your points. I don’t think there is anything inevitable about the drives, in fact my current work is all about building systems which reliably keep them in check. Similarly, I don’t think there is anything inevitable about explosive self-improvement and I think humanity would do better with a slow and carefully considered growth. But limiting both these forces will not happen on it own, we need to carefully design systems to do that. What I primarily argue against is sloppy design and poorly thought out systems.
The AI systems I’m talking about have rich world models which they are able to reason about to take actions to achieve desired goals. This is the kind of system described and analyzed in Russell and Norvig’s best-selling AI textbook. Today’s systems are not yet in that category, though some of the recent reinforcement learning systems based on deep learning are starting to enter the ballpark. In particular, almost no present system has a model of itself, its own design, or the greater context of its actions. Today’s systems are almost all very special purpose and incapable of the kind of reasoning I discuss. But the reasoning I discuss doesn’t require a superintelligence. My 9 year old nephew has no trouble understanding the basic logic.
You ask what the mechanism is by which the drives arise. They are an unintended consequence of the basic formula for rational action: 1) Have a model of the world, 2) Consider the possible actions you might take at any moment, 3) Using your model, estimate the expected utility of the future which arises from that action, 4) Choose the action which maximizes expected utility, 5) Take the action, 6) See what actually happens and update your model using Bayes’ Theorem. The point is that a simple agent applying that repeatedly will exhibit the Drives I discuss unless their goal is chosen very carefully. For example, self-protectiveness arises in this way because actions which cause the destruction of the self are much less likely to lead to desired outcomes than those which preserve the self.
The other drives are similar. You mention guerilla warfare and Gandhi. Those were both great strategies in their context. But they would have been even more powerful if they had more resources. You can also try to create a more “Gandhi-like” system by explicitly giving it goals which penalize it for trying to accumulate physical resources. But that is subtle and needs to be carefully designed. In general, I agree that if you can clearly identify the behaviors you’d like to see, it will be possible to design a system that exhibits those behaviors. But it will require deep analysis and a lot of careful work, it won’t generally come without effort.
Steve, first of all, thanks for taking the time to respond. I definitely agree that care should be taken when approaching AI design and I agree with the overall goal of AI safety. But we now see different approaches to safety being taken by different groups such as MIRI or OpenAI. So the assumptions that drive those strategies are important.
It seems that you are trying to specify a list of general goals that would be instrumental to any terminal goal. On the face of it, that seems ambitious. Out of the universe of possible strategies, your list appears arbitrary. Specifically, it seems cost and risk insensitive. For instance why is tool use missing from this list? Tool use seems to be preferable to self-modification on a cost and risk adjusted basis. Also, cooperation with other agents is a good low risk, low cost strategy, proven time and again in all social animals, yet missing from the list.
Regarding the guerilla question. I would actually assert that adding more resources to a guerilla effort would make it harder to evade detection. But the larger point is that pursuing energy efficiency for example would be a more cost effective strategy than acquiring more resources in a lot of cases.
I agree with the totality of your arguments and as you are aware I am highly skeptical of the FOOM scenario myself, even if my skepticism derives more from the purely physical/cognitive plausibility of such scenarios than any of these (so far as I know almost completely unquantified) arguments from personality.
That said, just a friendly suggestion – Gandhi isn’t perhaps the absolute best example to cite in support of your arguments. 😉
Ha, right. I see that the Gandhi example doesn’t really counteract the inevitability of resource acquisition. I hadn’t accounted for integer overflows!
Scott, much more detail can be found in my talks and papers at: https://selfawaresystems.com/
To understand why the particular drives I discuss are fundamental you need to start with physics as explained in this paper: https://selfawaresystems.com/…/paper-on-the-nature-of…/ Basically there are four fundamental physical resources that are needed for any computation or physical action: space, time, matter, and free energy (energy in a form which can do useful work). These core resources can generally only be applied to one goal at a time and generally having more of them allows goals to be better accomplished. An intentional agent is a physical structure with a goal which is capable of converting these resources into actions with outcomes that it rates as having high utility.
There are two basic ways to improve such a system’s performance: increase (or avoid decreasing) its resources and improving (or avoid worsening) the transformation of those resources into utility. The basic drives I describe are explicit aspects of these fundamental possibilities.
Tools are certain an interesting and important way to improve an agent’s ability to create utility. For biological creatures, it’s fairly clear what is self and what isn’t. But for AIs the distinction between tool and self is much less clear. One might define “self” for these systems as consisting of all resources over which they have control. If a “tool” is itself an intentional agent, then AIs have their own AI safety issues to solve.
Cooperation is also a fundamental and important subject but much more complex. The basic rational choice (without ethical components in a system’s utility) is between trying to take the resources of another system vs. cooperating with it. Lots of issues here that I delve into in several talks, like this one: https://selfawaresystems.com/…/talk-on-co-opetition-in…/
Steve, I am interested in exploring your ideas more and I will check out those links. The reason I am poking at this paper in particular is that it is being used by Bostrom and others to support what I consider to be an inflated probability estimate for a very specific hard takeoff singleton AI scenario.
When we consider system performance and resources, we see that a human brain currently utilizes far less space, time, matter, and free energy to perform massively greater computations than a Von Neumann supercomputer. So it’s clear that for an AI on a Von Neumann substrate, convincing a human to help it solve a computationally difficult problem would result in far less total energy cost. This is why cooperation seems like it will prove cost effective for all substrates less efficient than biological, and possibly others depending on how much you trust certain economic theories. I’m not sure if cooperation is much more complex if it can be reduced to risk exposure and cost expenditures.
I subscribe to the extended view of cognition, so I would argue that a human isn’t distinct from our tools any more than an AI is. I guess I would say that intentional agents aren’t tools per se unless they are treated as slaves. I also don’t want to suggest that an AI that spawns agents is more or less safe. I just want to assert that an AI that does spawn intentional agents reduces the likelihood of a singleton and also that an AI that prefers tools to self-modification will reduce the likelihood of recursive self-improving FOOM AI.
Pingback: 107. No Basic AI Drives & A Rebuttal to Omohundro’s 'Basic A.I. Drives' - AISafety.com