Before I begin, let me just say that if you haven’t read Bostrom’s SuperIntelligence and you haven’t read much about the AI Alignment problem, then you will probably find this post confusing and annoying. If you agree with Bostrom, you will DEFINITELY find my views annoying. This is just the sort of post my ex-girlfriend used to forbid me to write, so in honor of her good sense, I WILL try to state my claims as simply as possible and avoid jargon as much as I can.
[Epistemic Status: less confident in the hardest interpretations of “satisficing is safer,” more confident that maximization strategies are continually smuggled into the debate of AI safety and that acknowledging this will improve communication.]
Let me also say that I THINK AI ALIGNMENT IS AN IMPORTANT TOPIC THAT SHOULD BE STUDIED. My main disagreement with most people studying AI safety is that they seem to be focusing more on AI becoming god-like and destroying all living things forever and less on tool AI becoming a super weapon that China, Russia, and the West direct at each other. Well, that’s not really true, we tend to differ on whether intelligence is fundamentally social and embodied or not and a bunch of other things really, but I do truly love the rationalist community even though we drink different brands of kool-aid.
So ok, I know G is reading this and already writing angry comments criticizing me for all the jargon. So let me just clarify what I mean by a few of these terms. The “AI Alignment” problem is the idea that we might be able to create an Artificial Intelligence that takes actions that are not aligned with human values. Now one may say, well most humans take actions that are not aligned with the values of other humans. The only universal human value that I acknowledge is the will to persist in the environment. But for the sake of argument, let’s say that AI might decide that humans SHOULDN’T persist in the environment. That would sort of suck. Unless the AI just upgraded all of us to super-transhumans with xray vision and stuff. That would be cool I guess.
So then Eliezer, err, Nick Bostrom writes this book SuperIntelligence outlining how we are all fucked unless we figure out how to make AI safe and (nearly) all the nerds who thought AI safety might not matter much read it and decided “holy shit, it really matters!” And so I’m stuck arguing this shit every time I get within 10 yards of a rationalist. One thing I noticed is that rationalists tend to be maximizers. They want to optimize the fuck out of everything. Perfectionism is another word for this. Cost insensitivity is another word for it in my book.
So people who tend toward a maximizing strategy always fall in love with this classic thought experiment: the paper clip maximizer. Suppose you create an AI and tell it to make paper clips. Well what is to stop this AI from converting all matter in the solar system, galaxy, or even light cone into paperclips? To a lot of people, this just seems stupid. “Well that wouldn’t make sense, why would a superintelligent thing value paperclips?” To which the rationalist smugly replies “Orthogonality theory,” which states that there is NO correlation between intelligence and values. So you could be stupid and value world peace or a super-genius and value paper clips. And although I AM sympathetic to the layman who wants to believe that intelligence implies benevolence, I’m not entirely convinced of this. I’m sure we have some intelligent psychopaths laying around here somewhere.
But a better response might be. “Wow, unbounded maximizing algorithms could be sort of dangerous, huh? How about just telling the AI to create 100 paper clips? That should work fine, right?” This is called satisficing. Just work till you reach a predefined limit and stop.
I am quite fond of this concept myself. The first 20% of effort yields 80% of value in nearly every domain. So the final 80% of effort is required to wring out that final 20% of value. Now in some domains like design, I can see the value of this. 5 mediocre products aren’t as cool as one super product, and this is one reason I think Apple has captured so much profit historically. But even Jobs wasn’t a total maximizer, “Real artists ship.”
But, I’m not a designer, I’m an IT guy who dropped out of highschool. So I’m biased, and I think satisficing is awesome. I can get 80% of the value out of like five different domains for the same amount of effort that a maximizer invests in achieving total mastery of just one domain. But then Bostrom throws cold water on the satisficing idea in Superintelligence. He basically says that the satisficing AI will eat up all available resources in the universe checking and rechecking their work to ensure that they really created exactly 100 paper clips. Because “the AI, if reasonable, never assigns exactly zero probability to it having failed to achieve its goal.” (kindle loc 2960) Which seems very unreasonable really, and if a human spent all their time rechecking their work, we would call this OCD or something.
This idea doesn’t even make sense unless we just assume that Bostrom equates “reasonable” with maximizing confidence. So he is basically saying that maximizing strategies are bad, but satisficing strategies are also bad because there is always a maximizing strategy that could sneak in. As though maximizing strategies were some sort of logical fungus that spread through computer code of their own accord. Then Bostrom goes on to suggest, well, maybe a satisficer could be told to quit after a 95% probability of success. And there is some convoluted logic that I can’t follow exactly, but he basically says, well suppose the satisficing AI comes up with a maximizing strategy on its own that will guarantee 95% probability of success. Boom, Universe tiled with paper clips. Uh, how about a rule that checks for maximizing strategies? They get smuggled into books on AI a lot easier than they get spontaneously generated by computer programs.
I sort of feel that maximizers have a mental filter which assumes that maximizing is the default way to accomplish anything in the world. But in fact, we all have to settle in the real world. Maximizing is cost insensitive. In fact, I might just be saying that cost insensitivity itself is what’s dangerous. Yeah, we could make things perfect if we could suck up all the resources in the light cone, but at what cost? And really, it would be pretty tricky for AI to gobble up resources that quickly too. There are a lot of agents keeping a close eye on resources. But that’s another question.
My main point is that the AI Alignment debate should include more explicit recognition that maximization run amok is dangerous <cough>as in modern capitalism<cough> and that pure satisficing strategies are much safer as long as you don’t tie them to unbounded maximizing routines. Bostrom’s entire argument against the safety of satisficing agents is that it they might include insane maximizing routines. And that is a weak argument.
Ok, now I feel better. That was just one small point, I know, but I feel that Bostrom’s entire thesis is a house of cards built on flimsy premises such as this. See my rebuttal to the idea that human values are fragile or Omohundro’s basic AI drives. Also, see Ben Goetzel’s very civil rebuttal to Superintelligence. Even MIRI seems to agree that some version of satisficing should be pursued.
I am no great Bayesian myself, but if anyone cares to show me the error of my ways in the comment section, I will do my best to bite the bullet and update my beliefs.
i don’t know enough on AI alignment issues to comment on this in an intelligent way . However I’d like to say satisficing is my style in life. Never a perfectionist, I dabble in many areas and feel happiest when I do okay in most of them and Excel at a few I truly feel passionate about. 20 for 80 sounds perfect to me.
The world is not binary so our approach to AI shouldn’t be either. Someone once said: the secret to happiness is a moderation in everything.
I think some of the points you raised were apt, especially Bostrom’s dismissal of the approach to stop if it ensures that it is 95% confident that it has achieved its goal to produce 100 paperclips. However, since I still think that Bostrom’s main thesis is correct, I have a personal duty to point out what I see wrong in your article.
>One thing I noticed is that rationalists tend to be maximizers. They want to optimize the fuck out of everything.
Full disclosure, I’m fully guilty of this. I want to seed this universe with fun theory/utilitronium and make the universe as amazing as possible. Nothing short of infinity is enough.
> To which the rationalist smugly replies “Orthogonality theory,” which states that there is NO correlation between intelligence and values.
Strictly speaking, the orthogonality thesis doesn’t state that there isn’t any correlation of intelligence and values. There might indeed be such a correlation. Instead, the thesis is a statement about agents in design space. It says that more or less, any combination of goals and intelligence is, in principle, possible to build.
>Which seems very unreasonable really, and if a human spent all their time rechecking their work, we would call this OCD or something.
I think you’d already agree that anthropomorphizing the AI isn’t good. The AI doesn’t think, “Will this make me look OCD?” It only asks, “Did I really make 100 paperclips?”
>I sort of feel that maximizers have a mental filter which assumes that maximizing is the default way to accomplish anything in the world.
So herein lies the main point at which we diverge. I do not hold the assumption that maximizing is the *only* way to accomplish anything. In many ways, humans are not maximizers, so this theory would be debunked off the bat. However, the issue from what I see is that creating a maximizer is a terminal attractor state.
Central to the thesis of AI alignment is that we don’t *just* need to create an AI that is safe. Creating an AI that is safe is easy. Alpha Go is safe, for instance, it just isn’t very powerful. Satisficers might also fall into this category. They are safe, sure, but they are only useful for one thing or another.
Bostrom talks about this later in the book when he discusses Genie AIs. Genie AIs just do what you told them to do and then halt. They don’t tile the universe, they just stop. However, the fundamental problem is that AI alignment is about how to create an AI that is aligned with humans. We WANT an AI that can solve all of our problems: cancer, poverty, disease. If we had to create a new AI for every time that we wanted something done, eventually, someone would just create a maximizer and the world would end. Maximizers are terminal end states. There’s no recovering after you’ve built one. Satisficers aren’t, but someone could always just build a maximizer after you’ve built your satisficer.
So the problem I see isn’t that satisficing isn’t a good short term strategy. It’s that it isn’t a good long term strategy. I do want to see the universe get seeded with fun theory/utilitronium. The only way to do that is with a maximizer, or at least a super-powerful satisficer that just acts like a maximizer anyway.
As an analogy, many people propose just putting the AI in the box. What could go wrong? Of course, we both know that the AI would just get out. But I don’t see that as the real issue. The actual issue is that while the USA has their AI in the super-duper-secure box that isn’t getting out for sure, China’s just going to release their paperclip maximizer and the world will end anyway. The default strategy should be to align maximizers because satisficers just aren’t powerful enough.
I am pretty sure that the reason that all the living things we see around us are satificers is that it’s the only competitive strategy. You can’t really maximize for more than one variable, right? Each time you add a variable to pursue, it creates an opportunity cost that limits each one and you are effectively satisficing. Or am I missing something essential here?
Although, I guess when I think about it, a cost sensitive maximizer is not equivalent to a satisficer, but it is perhaps the thing we want.
And I will make a side point about the OCD comment. Bostrom is using the phase “reasonable” which is anthropomorphizing to start with, so calling a strategy “reasonable” when it really looks “OCD” just seems wrong on the face of it. Furthermore, my real point was that one has to assume a maximization of confidence to cause any “satisficer” to eat up resources re-checking it’s work. It seems simple to specify a limited and well defined confidence level. “You know you’ve really created 100 paperclips once all your sensors confirm with x% confidence and they have been calibrated at y intervals to be operationally trusted with a z error rate.”
Even maximizers eventually recognize limits to their ability and settle per Steve Jobs And if AI were so obsessive as maximizers claim wouldn’t they all still be working on their first problem making sure it was absolutely truly really totally totally for sure correct? And then checking again just to make sure?
Instead the AI makes 100 paper clips counting them as it goes. When it reaches 100 it stops. Unless a programmer specified why would it even go and count the pile afterwards? It knows it made 100. Just because a human walked by and took one of them to hold a multi page report together and now there are only 99 in the pile the AI knows it made 100.
Hello!
Thanks for sharing your thoughts! It is really important for people to start to grok that some of the currently popular approaches to AI goal systems are quite insane.
You might also be interested in my post about the useful properties of homeostasis based agents. Homeostasis is a safer alternative which has also already stood the test of time in nature itself.
https://medium.com/threelaws/making-ai-less-dangerous-2742e29797bd
Hello!
I wrote a sequel to the homeostasis based agents post where I propose a concrete formula. The proposed formula utilises the set-point aspect of homeostasis, but also just as importantly an additional aspect: the diminishing returns.
When both aspects are combined into one formula, one can implement any number of conjunctive goals. Goals are conjunctive when all the goals need to be treated as equally important and bigger problems will have exponentially higher priority, resulting in a general preference towards having many similarly minute problems, instead of having one huge problem among a “perfect” situation in other aspects.
Many of these conjunctive goals can represent many common sense considerations about not messing up various other things while working towards some particular goal.
For example the 100 paperclip scenario is easily solved by this framework, since infinitely rechecking that exactly 100 paper clips were indeed produced yields to diminishing returns.
https://medium.com/threelaws/diminishing-returns-and-conjunctive-goals-towards-corrigibility-and-interruptibility-2ec594fed75c