Before I begin, let me just say that if you haven’t read Bostrom’s SuperIntelligence and you haven’t read much about the AI Alignment problem, then you will probably find this post confusing and annoying. If you agree with Bostrom, you will DEFINITELY find my views annoying. This is just the sort of post my ex-girlfriend used to forbid me to write, so in honor of her good sense, I WILL try to state my claims as simply as possible and avoid jargon as much as I can.
[Epistemic Status: less confident in the hardest interpretations of “satisficing is safer,” more confident that maximization strategies are continually smuggled into the debate of AI safety and that acknowledging this will improve communication.]
Let me also say that I THINK AI ALIGNMENT IS AN IMPORTANT TOPIC THAT SHOULD BE STUDIED. My main disagreement with most people studying AI safety is that they seem to be focusing more on AI becoming god-like and destroying all living things forever and less on tool AI becoming a super weapon that China, Russia, and the West direct at each other. Well, that’s not really true, we tend to differ on whether intelligence is fundamentally social and embodied or not and a bunch of other things really, but I do truly love the rationalist community even though we drink different brands of kool-aid.
So ok, I know G is reading this and already writing angry comments criticizing me for all the jargon. So let me just clarify what I mean by a few of these terms. The “AI Alignment” problem is the idea that we might be able to create an Artificial Intelligence that takes actions that are not aligned with human values. Now one may say, well most humans take actions that are not aligned with the values of other humans. The only universal human value that I acknowledge is the will to persist in the environment. But for the sake of argument, let’s say that AI might decide that humans SHOULDN’T persist in the environment. That would sort of suck. Unless the AI just upgraded all of us to super-transhumans with xray vision and stuff. That would be cool I guess.
So then Eliezer, err, Nick Bostrom writes this book SuperIntelligence outlining how we are all fucked unless we figure out how to make AI safe and (nearly) all the nerds who thought AI safety might not matter much read it and decided “holy shit, it really matters!” And so I’m stuck arguing this shit every time I get within 10 yards of a rationalist. One thing I noticed is that rationalists tend to be maximizers. They want to optimize the fuck out of everything. Perfectionism is another word for this. Cost insensitivity is another word for it in my book.
So people who tend toward a maximizing strategy always fall in love with this classic thought experiment: the paper clip maximizer. Suppose you create an AI and tell it to make paper clips. Well what is to stop this AI from converting all matter in the solar system, galaxy, or even light cone into paperclips? To a lot of people, this just seems stupid. “Well that wouldn’t make sense, why would a superintelligent thing value paperclips?” To which the rationalist smugly replies “Orthogonality theory,” which states that there is NO correlation between intelligence and values. So you could be stupid and value world peace or a super-genius and value paper clips. And although I AM sympathetic to the layman who wants to believe that intelligence implies benevolence, I’m not entirely convinced of this. I’m sure we have some intelligent psychopaths laying around here somewhere.
But a better response might be. “Wow, unbounded maximizing algorithms could be sort of dangerous, huh? How about just telling the AI to create 100 paper clips? That should work fine, right?” This is called satisficing. Just work till you reach a predefined limit and stop.
I am quite fond of this concept myself. The first 20% of effort yields 80% of value in nearly every domain. So the final 80% of effort is required to wring out that final 20% of value. Now in some domains like design, I can see the value of this. 5 mediocre products aren’t as cool as one super product, and this is one reason I think Apple has captured so much profit historically. But even Jobs wasn’t a total maximizer, “Real artists ship.”
But, I’m not a designer, I’m an IT guy who dropped out of highschool. So I’m biased, and I think satisficing is awesome. I can get 80% of the value out of like five different domains for the same amount of effort that a maximizer invests in achieving total mastery of just one domain. But then Bostrom throws cold water on the satisficing idea in Superintelligence. He basically says that the satisficing AI will eat up all available resources in the universe checking and rechecking their work to ensure that they really created exactly 100 paper clips. Because “the AI, if reasonable, never assigns exactly zero probability to it having failed to achieve its goal.” (kindle loc 2960) Which seems very unreasonable really, and if a human spent all their time rechecking their work, we would call this OCD or something.
This idea doesn’t even make sense unless we just assume that Bostrom equates “reasonable” with maximizing confidence. So he is basically saying that maximizing strategies are bad, but satisficing strategies are also bad because there is always a maximizing strategy that could sneak in. As though maximizing strategies were some sort of logical fungus that spread through computer code of their own accord. Then Bostrom goes on to suggest, well, maybe a satisficer could be told to quit after a 95% probability of success. And there is some convoluted logic that I can’t follow exactly, but he basically says, well suppose the satisficing AI comes up with a maximizing strategy on its own that will guarantee 95% probability of success. Boom, Universe tiled with paper clips. Uh, how about a rule that checks for maximizing strategies? They get smuggled into books on AI a lot easier than they get spontaneously generated by computer programs.
I sort of feel that maximizers have a mental filter which assumes that maximizing is the default way to accomplish anything in the world. But in fact, we all have to settle in the real world. Maximizing is cost insensitive. In fact, I might just be saying that cost insensitivity itself is what’s dangerous. Yeah, we could make things perfect if we could suck up all the resources in the light cone, but at what cost? And really, it would be pretty tricky for AI to gobble up resources that quickly too. There are a lot of agents keeping a close eye on resources. But that’s another question.
My main point is that the AI Alignment debate should include more explicit recognition that maximization run amok is dangerous <cough>as in modern capitalism<cough> and that pure satisficing strategies are much safer as long as you don’t tie them to unbounded maximizing routines. Bostrom’s entire argument against the safety of satisficing agents is that it they might include insane maximizing routines. And that is a weak argument.
Ok, now I feel better. That was just one small point, I know, but I feel that Bostrom’s entire thesis is a house of cards built on flimsy premises such as this. See my rebuttal to the idea that human values are fragile or Omohundro’s basic AI drives. Also, see Ben Goetzel’s very civil rebuttal to Superintelligence. Even MIRI seems to agree that some version of satisficing should be pursued.
I am no great Bayesian myself, but if anyone cares to show me the error of my ways in the comment section, I will do my best to bite the bullet and update my beliefs.