Satisficing is Safer Than Maximizing

Before I begin, let me just say that if you haven’t read Bostrom’s SuperIntelligence and you haven’t read much about the AI Alignment problem, then you will probably find this post confusing and annoying. If you agree with Bostrom, you will DEFINITELY find my views annoying. This is just the sort of post my ex-girlfriend used to forbid me to write, so in honor of her good sense, I WILL try to state my claims as simply as possible and avoid jargon as much as I can.

[Epistemic Status: less confident in the hardest interpretations of “satisficing is safer,” more confident that maximization strategies are continually smuggled into the debate of AI safety and that acknowledging this will improve communication.]

Let me also say that I THINK AI ALIGNMENT IS AN IMPORTANT TOPIC THAT SHOULD BE STUDIED. My main disagreement with most people studying AI safety is that they seem to be focusing more on AI becoming god-like and destroying all living things forever and less on tool AI becoming a super weapon that China, Russia, and the West direct at each other. Well, that’s not really true, we tend to differ on whether intelligence is fundamentally social and embodied or not and a bunch of other things really, but I do truly love the rationalist community even though we drink different brands of kool-aid.

So ok, I know G is reading this and already writing angry comments criticizing me for all the jargon. So let me just clarify what I mean by a few of these terms. The “AI Alignment” problem is the idea that we might be able to create an Artificial Intelligence that takes actions that are not aligned with human values. Now one may say, well most humans take actions that are not aligned with the values of other humans. The only universal human value that I acknowledge is the will to persist in the environment. But for the sake of argument, let’s say that AI might decide that humans SHOULDN’T persist in the environment. That would sort of suck. Unless the AI just upgraded all of us to super-transhumans with xray vision and stuff. That would be cool I guess.

So then Eliezer, err, Nick Bostrom writes this book SuperIntelligence outlining how we are all fucked unless we figure out how to make AI safe and (nearly) all the nerds who thought AI safety might not matter much read it and decided “holy shit, it really matters!” And so I’m stuck arguing this shit every time I get within 10 yards of a rationalist. One thing I noticed is that rationalists tend to be maximizers. They want to optimize the fuck out of everything. Perfectionism is another word for this. Cost insensitivity is another word for it in my book.

So people who tend toward a maximizing strategy always fall in love with this classic thought experiment: the paper clip maximizer. Suppose you create an AI and tell it to make paper clips. Well what is to stop this AI from converting all matter in the solar system, galaxy, or even light cone into paperclips? To a lot of people, this just seems stupid. “Well that wouldn’t make sense, why would a superintelligent thing value paperclips?” To which the rationalist smugly replies “Orthogonality theory,” which states that there is NO correlation between intelligence and values. So you could be stupid and value world peace or a super-genius and value paper clips. And although I AM sympathetic to the layman who wants to believe that intelligence implies benevolence, I’m not entirely convinced of this. I’m sure we have some intelligent psychopaths laying around here somewhere.

But a better response might be. “Wow, unbounded maximizing algorithms could be sort of dangerous, huh? How about just telling the AI to create 100 paper clips? That should work fine, right?” This is called satisficing. Just work till you reach a predefined limit and stop.

I am quite fond of this concept myself. The first 20% of effort yields 80% of value in nearly every domain. So the final 80% of effort is required to wring out that final 20% of value. Now in some domains like design, I can see the value of this. 5 mediocre products aren’t as cool as one super product, and this is one reason I think Apple has captured so much profit historically. But even Jobs wasn’t a total maximizer, “Real artists ship.”

But, I’m not a designer, I’m an IT guy who dropped out of highschool. So I’m biased, and I think satisficing is awesome. I can get 80% of the value out of like five different domains for the same amount of effort that a maximizer invests in achieving total mastery of just one domain. But then Bostrom throws cold water on the satisficing idea in Superintelligence. He basically says that the satisficing AI will eat up all available resources in the universe checking and rechecking their work to ensure that they really created exactly 100 paper clips. Because “the AI, if reasonable, never assigns exactly zero probability to it having failed to achieve its goal.” (kindle loc 2960) Which seems very unreasonable really, and if a human spent all their time rechecking their work, we would call this OCD or something.

This idea doesn’t even make sense unless we just assume that Bostrom equates “reasonable” with maximizing confidence. So he is basically saying that maximizing strategies are bad, but satisficing strategies are also bad because there is always a maximizing strategy that could sneak in. As though maximizing strategies were some sort of logical fungus that spread through computer code of their own accord. Then Bostrom goes on to suggest, well, maybe a satisficer could be told to quit after a 95% probability of success. And there is some convoluted logic that I can’t follow exactly, but he basically says, well suppose the satisficing AI comes up with a maximizing strategy on its own that will guarantee 95% probability of success. Boom, Universe tiled with paper clips. Uh, how about a rule that checks for maximizing strategies? They get smuggled into books on AI a lot easier than they get spontaneously generated by computer programs.

I sort of feel that maximizers have a mental filter which assumes that maximizing is the default way to accomplish anything in the world. But in fact, we all have to settle in the real world. Maximizing is cost insensitive.  In fact, I might just be saying that cost insensitivity itself is what’s dangerous. Yeah, we could make things perfect if we could suck up all the resources in the light cone, but at what cost? And really, it would be pretty tricky for AI to gobble up resources that quickly too. There are a lot of agents keeping a close eye on resources. But that’s another question.

My main point is that the AI Alignment debate should include more explicit recognition that maximization run amok is dangerous <cough>as in modern capitalism<cough> and that pure satisficing strategies are much safer as long as you don’t tie them to unbounded maximizing routines. Bostrom’s entire argument against the safety of satisficing agents is that it they might include insane maximizing routines.  And that is a weak argument.

Ok, now I feel better. That was just one small point, I know, but I feel that Bostrom’s entire thesis is a house of cards built on flimsy premises such as this. See my rebuttal to the idea that human values are fragile or Omohundro’s basic AI drives.  Also, see Ben Goetzel’s very civil rebuttal to Superintelligence.  Even MIRI seems to agree that some version of satisficing should be pursued.

I am no great Bayesian myself, but if anyone cares to show me the error of my ways in the comment section, I will do my best to bite the bullet and update my beliefs.

Why the Back to Nature Movement Failed

modern caveman on computer

The paleo diet has been popular for a while now, and it prescribes a “back to nature” way of eating that’s interesting. The premise is that humans evolved in an environment devoid of processed foods and high-glycemic carbs, so we should eat a diet that more closely mimics our paleolithic ancestors. I’m not going to try to defend the paleo diet per se, some people lose weight on it, whatever.  But it’s an interesting framework for considering what environments we as humans are adapted to and how we can apply that to the problems of modern life.

Consider depression. Two of the top cures for depression are exercise and light therapy.  It’s clear that humans evolved for at least 100,000 years, largely outdoors, moving around in the sunlight.  Depression is probably best thought of as a disease of modern life, where we’re living indoors and are largely sedentary.

Another aspect of modern, developed cultures is social isolation.  Humans are social animals, and we arguably evolved in tribes of roughly 150 members, according to the Dunbar number.  (I know that Dunbar has been supplanted by newer research, let’s just use this number as a starting point.)

Depression is probably best thought of as a disease of modern life, where we’re living indoors and are largely sedentary. . . Another aspect of modern, developed cultures is social isolation. . . So let’s consider these three aspects of an evolved human lifestyle: 1) Living outdoors in the sun, 2) Moving around continually, and 3) Being surrounded by a community of other humans invested in our survival.  These are all things that many of us struggle with in modern life.

So let’s take these three aspects of an evolved human lifestyle: 1) Living outdoors in the sun, 2) Moving around continually, and 3) Being surrounded by a community of other humans invested in our survival.  These are all things that many of us struggle with in modern life.  Sure, maybe some people still live in tight-knit, traditional farm communities that fulfill these needs, but, here in the US, economic forces have largely broken the cohesion of these rural places and we see drug abuse epidemics as a consequence.

Transhumanists can rightly argue that our need for sunlight, exercise, and social support are just kludgy legacy code tied to our messy biological bodies.  Maybe we can upgrade humans to be more machine-like with replaceable parts and we can do away with these outdated needs.  That’s a valid argument.  I don’t happen to agree with it, but it’s coherent at least.  For the sake of this discussion, I ask my transhumanist friends to acknowledge that these human 2.0 upgrades don’t seem to be right around the corner, so it probably makes sense to make accommodations for the hardware we humans are running right now.

Hippies tried to solve the problems of modern life in the sixties with their back to nature movement. . . But what ever happened to that movement, anyway? . . I asked a fellow named Frosty, an old hippie scientist at one of my clients, who said that when his friends from the city showed up at the rural commune, they blanched at how much work needed to be done.  They didn’t have the skills needed to build structures by hand, grow food, or dig latrines.  And then they would look around and ask, “Where’s the bar?”  They wanted to get drunk and hang out.  Who can blame them?

Hippies tried to solve the problems of modern life in the sixties with their back to nature movement.  Good old Stewart Brand was in the thick of it with his Whole Earth Catalog.  Many long-haired freaks trekked out to the middle of nowhere to build geodesic domes out of logs and get naked in the mud together.  Awesome!

But what ever happened to that movement, anyway?  What went wrong?  Brand himself said at a Long Now talk that the hippies discovered that the cities were where the action was.  I’m fortunate to work with these old hippie scientists at one of my clients, and I asked a fellow named Frosty why the back to nature movement didn’t properly take hold.  He laughed and said that when his friends from the city showed up at the rural commune, they blanched at how much work needed to be done.  They didn’t have the skills needed to build structures by hand, grow food, or dig latrines.  And then they would look around and ask, “Where’s the bar?”  They wanted to get drunk and hang out.  Who can blame them?

Twentieth century communists in Asia attempted their own versions of the back to nature movement.  They took what appears to be a sound hypothesis and effectively implemented it as genocide.  Mao’s Cultural Revolution forced the relocation of city dwellers to the countryside, resulting in disaster.  Pol Pot’s Year Zero also involved a violent reset of the clock, trying to turn back time and force modern people to live as our ancestors did, also a terrible failure.  So yes, as Scott Alexander says, we “see the skulls.”  We need to learn the lessons of previous failed attempts before we can rectify the problems with modern life.

Cities are where the power is accumulating.  Cities are more energy efficient.  Cities are where the action is.  But how can we remake our lifestyles to fit them? . . We see the first glimmers of a solution with Silicon Valley’s obsession with social, mobile, and augmented reality. . . Maybe augmented reality will give us the ability to move freely around the city, connect with our communities, and still do modern work, but while getting exercise and sunlight at the same time.  Call it the “Back to the City, But Working Outside, Walking Around Movement?”  Not catchy, but you get the picture.

We can’t turn back the clock.  We have to start where we are and assume that progress will keep happening whether we like it or not.  Cities are where the power is accumulating.  Cities are more energy efficient.  Cities are where the action is.  But how can we remake our lifestyles to fit them?  We see the first glimmers of a solution with Silicon Valley’s obsession with social, mobile, and augmented reality.  Perhaps we can find our communities via social network technology.  I certainly feel vastly enriched by my East Bay Futurists Meetup.  I’ve made good friends there, who help me grow and teach me a lot.  Mobile technology has made it easier and easier for people to do real work on the move.  Maybe augmented reality will close the loop and give us the ability to move freely around the city, connect with our communities, and still do modern work, but while getting exercise and sunlight at the same time.  Call it the “Back to the City, But Working Outside, Walking Around Movement?”  Ahh, well, not catchy, but you get the picture.  We just need to start redesigning our cities a little bit.  Step One: More parks!

Superintelligence Skepticism: A Rebuttal to Omohundro’s “Basic A.I. Drives”

Superintelligence by Nick Bostrom

In the past couple of years, we have seen a number of high profile figures in the science and tech fields warn us of the dangers of artificial intelligence.  (See comments by Stephen Hawking, Elon Musk, and Bill Gates, all expressing concern that A.I. could pose a danger to humanity.)  In the midst of this worrisome public discussion, Nick Bostrom released a book called Superintelligence, which outlines the argument that A.I. poses a real existential threat to humans.  A simplified version of the argument says that a self-improving A.I. will so rapidly increase in intelligence that it will go “FOOM” and far surpass all human intelligence in the blink of an eye.  This godlike A.I. will then have the power to rearrange all of the matter and energy in the entire solar system and beyond to suit its preferences.  If its preferences are not fully aligned with what humans want, then we’re in trouble.

A lot of people are skeptical about this argument, myself included.  Ben Goertzel has offered the most piercing and even-handed analysis of this argument that I have seen.  He points out that Bostrom’s book is really a restatement of ideas which Eliezer Yudkowsky has been espousing for a long time.  Then Goertzel digs very carefully through the argument and points out that the likelihood of an A.I. destroying humanity is probably lower than Bostrom and Yudkowsky think it is, which I agree with.  He also points out the opportunity costs of NOT pursuing A.I., but I don’t think we actually need to worry about that, given how the A.I. community seems to be blasting full speed ahead and A.I. safety concerns don’t seem to be widely heeded.

Even though I assign a low probability that A.I. will destroy all humans, I don’t rule it out. It would clearly be a very bad outcome and I am glad that people are trying to prevent this. What concerns me is that some of the premises that Bostrom bases his arguments on seem deeply flawed. I actually think that the A.I. safety crowd would be able to make a STRONGER argument if they would shore up some of these faulty premises, so I want to focus on one of them, basic A.I. drives, in this post.

Now, even though I assign a low probability that A.I. will destroy all humans, I don’t rule it out.  It would clearly be a very bad outcome and I am glad that people are trying to prevent this. What concerns me is that some of the premises that Bostrom bases his arguments on seem deeply flawed.  I actually think that the A.I. safety crowd would be able to make a STRONGER argument if they would shore up some of these faulty premises, so I want to focus on one of them, basic A.I. drives, in this post.

In Superintelligence, Bostrom cites a 2008 paper by Stephen Omohundro called “The Basic A.I. Drives.”  From the abstract:

We identify a number of “drives” that will appear in sufficiently advanced A.I. systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted.

Now this is already raising warning bells for me, since obviously we have a bunch of A.I. systems with goals and none of them seem to be exhibiting any of the drives that Omohundro is warning about.  Maybe they aren’t “sufficiently” advanced yet?  It also seems odd that Omohundro predicts that these drives will be present without having been designed in by the programmers.  That seems quite odd.  He doesn’t really offer a mechanism for how these drives might arise.  I can imagine a version of this argument that says “A.I. with these drives will outcompete A.I. without these drives” or something.  But that still requires that a programmer would need to put the drives in, they don’t just magically emerge.

It … seems odd that Omohundro predicts that these drives will be present (in A.I.) without having been designed in by the programmers.  That seems quite odd.  He doesn’t really offer a mechanism for how these drives might arise.  I can imagine a version of this argument that says “A.I. with these drives will outcompete A.I. without these drives” or something.  But that still requires that a programmer would need to put the drives in, they don’t just magically emerge. … Anyway, let’s dig in a bit further and examine each of these drives.

Biological systems have inherent drives, but I don’t see how any artificial system could spontaneously acquire drives unless it had similar machinery that gave rise to the drives in living things.  And biological systems are constrained by things like the need to survive.  Humans get hungry, so they have to eat to survive, this is a basic drive that is driven by biology.  The state of “wanting” something doesn’t just show up unannounced, it’s the result of complex systems; and the only existing examples of wanting we see are in biological systems, not artificial ones.  If someone posits an artificial system that has the drives of a living thing, but not the constraints, then I need to see the mechanism that they think could make this happen.

So that’s a huge problem.  What does it even mean to say that A.I. will “have” these drives?  Where do these drives come from?  Big problem.  Huge.

Anyway, let’s dig in a bit further and examine each of these drives.  What we see is that, in each case, Omohundro sort of posits a reasonable sounding explanation of why each drive would be “wanted” by an A.I..  But even though this is a paper written in sort of an academic style with citations and everything, it’s not much more than a set of reasonable sounding explanations.  So I will take a cue from rational blogger Ozymandias, and I will list each of Omohundro’s drives and then offer my own list of plausible explanations for why each drive would be entirely different.

1. A.I.s will want to self-improve. Why self modify when you can make tools?  Tools are a safer way to add functionality than self modification.  This is the same argument I use against current generation grinders.  Don’t cut yourself open to embed a thermometer.  Just grab one when you need it and then put it aside.  Also, it’s easy to maintain a utility function if the A.I. just straps on a module as opposed to messing with its own source code.  Upgrades to tools are easy too. It’s foolish and risky to self modify when you can just use tools.

When I first posted this to Facebook, I got into this whole debate with Alexei, who has insight into MIRI’s thinking.  He insisted that the optimization of decision making processes will lead to overwhelming advantages over time.  I countered with the argument that competing agents don’t get unbounded time to work on problems and that’s why we see these “good enough,” satisficing strategies throughout nature.  But a lot of safety A.I. people won’t allow that there can ever be any competition between A.I., because once a single A.I. goes FOOM and becomes godlike, no others can compete with it and it becomes the one to rule them all.  But the period leading up to takeoff would certainly involve competition with other agents, and I also believe that problem solving intelligence does not exist independently, outside of a group, but I won’t get into that here.

2. A.I.s will want to be rational.  This seems correct in theory.  Shouldn’t we predict that rational agents will outcompete irrational agents?  Yet, when we look at the great competition engine of evolution, we see humans at the top, and we aren’t that rational.  Maybe it’s really, really, really hard for rational agents to exist because it’s hard to predict the outcomes of actions and also goals evolve over time. Not sure about this one, my objection is weak.

3. A.I.s will try to preserve their utility functions.  Utility functions for humans (i.e. human values) have clearly evolved over time and are different in different cultures.  Survival might be the ultimate function of all living things, followed by reproduction.  Yet we see some humans sacrificing themselves for others and also some of us (myself included) don’t reproduce.  So even these seemingly top level goals are not absolute.  It may well be that an agent whose utility function doesn’t evolve will be outcompeted by agents whose goals do evolve.  This seems to be the case empirically.

4. A.I.s will try to prevent counterfeit utility.  I don’t really disagree with this.  Though there may be some benefit from taking in SOME information that wouldn’t be part of the normal search space when only pursuing our goals.  The A.I. equivalent of smoking pot might be a source of inspiration that leads to insights and thus actually rational.  But it could certainly APPEAR to be counterfeit utility.

5. A.I.s will be self-protective.  Hard to disagree with this.  This is a reliable goal.  But, as I mentioned earlier in this post, I have questions about where this goal would come from.  DNA based systems have it.  But it’s built into how we function. It didn’t just arise.  AlphaGo doesn’t resist being turned off for some reason.

6. A.I.s will want to acquire resources and use them efficiently.  Omohundro further says, “All computation and physical action requires the physical resources of space, time, matter, and free energy.  Almost any goal can be better accomplished by having more of these resources.” I strongly disagree with this.  Rationalists have told me that Gandhi wouldn’t take a pill that would make him a psycho killer and they want to build a Gandhi like A.I.  But if we take that analogy a bit farther, we see that Gandhi didn’t have much use for physical resources.  There are many examples of this.  A person who prefers to sit on the couch all day and play guitar doesn’t require more physical resources either.  They might acquire them by writing a hit song, but they aren’t instrumental to their success.

Guerrilla warfare can defeat much larger armies without amassing more resources.  Another point a futurist would make is that sufficiently advanced A.I. will have an entirely different view of physics.  Resources like space, time, and matter might not even be relevant or could possibly even be created or repurposed in ways we can’t even imagine.  This is a bit like a bacteria assuming that humans will always need glucose.  We do, of course, but we haven’t taken all of the glucose away from bacteria, far from it.  And we get glucose via mechanisms that a bacteria can’t imagine.

So really, I hope that the safety A.I. community will consider these points and try to base their arguments on stronger premises. … If we are just throwing reasonable explanations around, let’s consider a broader range of positions. … I offer all of this criticism with love though. I really do. Because at the end of the day, I don’t want our entire light cone converted into paper clips either.

So really, I hope that the safety A.I. community will consider these points and try to base their arguments on stronger premises.  Certainly Omohundro’s 2008 paper is in need of a revision of some kind.  If we are just throwing reasonable explanations around, let’s consider a broader range of positions.  Let’s consider the weaknesses of optimizing for one constraint, as opposed to satisficing for a lot of goals.  Because a satisficing A.I. seems much less likely to go down the FOOM path than an optimizing A.I., and, ironically, it would also be more resilient to failure.  I offer all of this criticism with love though.  I really do.  Because at the end of the day, I don’t want our entire light cone converted into paper clips either.

[EDIT 4/10/2016]
I appreciate that Steve came and clarified his position in the comments below. I think that my primary objection now boils down to the fact that the list of basic A.I. drives is basically cost and risk insensitive. If we consider the cost and risk of strategies, then an entirely different (more realistic?) list would emerge, providing a different set of premises.

[EDIT 4/11/2016]
When you think about it, Omohundro is basically positing a list of strategies that would literally help you solve any problem.  This is supposed to be a fully general list of instrumental goals for ANY terminal goal.  This is an extraordinary claim. We should be amazed at such a thing!  We should be able to take each of these goals and use them to solve any problem we might have in our OWN lives right now.  When you think of it this way, you realize that this list is pretty arbitrary and shouldn’t be used as the basis for other, stronger arguments or for calculating likelihoods of various AI outcomes such as FOOM Singletons.

[EDIT 4/12/2016]
I was arguing with Tim Tyler about this on Facebook, and he pointed out that a bunch of people have come up with these extraordinary lists of universal instrumental values.  I pointed out that all of these seemed equally arbitrary and that it is amazing to me that cooperation is never included.  Cooperation is basically a prerequisite for all advanced cognition and yet all these AI philosophers are leaving it off their lists?  What a strange blind spot.  These sorts of fundamental oversights are biasing the entire conversation about AI safety.

We see in nature countless examples of solutions to coordination problems from biofilms to social animals and yet so many AI people and rationalists in general spurn evolution as a blind idiot god.  Well this blind idiot god somehow demanded cooperation and that’s what it got!  More AI safety research should focus on proven solutions to these cooperation problems.  What’s the game theory of biofilms?  More Axelrod, less T.H. Huxley!