The View from Hell

Just another site

Unfriendliness is Unsolvable

with 16 comments

The poverty of evolutionarily-generated human cognitive capacity is such that it “can only be grasped through much study of cognitive science, until the full horror begins to dawn upon you,” says Eliezer Yudkowsky, in his essay “My Childhood Role Model” on Overcoming Bias. In considering artificial intelligence, we should not assume that our meat brains establish anything like limits on the possibilities for intelligence. Indeed, Eliezer describes, in the form of a science fiction fable, a situation where even slightly-augmented humanity outsmarts a powerful alien civilization by extracting more information from observation than the aliens realize is possible.

Not only our capacity for cognition in endeavors such as science, but also our values – including the extent to which we can choose our values, and process by which we do so – are determined and limited by the meat hardware on which we run. Our capacity for evaluating ethical arguments, and the intuitions we base our ethics on, are in place largely by evolutionary chance.

Given the destruction a sufficiently powerful AI could do – and I think Eliezer’s “That Alien Message” may be taken as an argument that any sufficiently intelligent AI is by definition unpredictably powerful – those interested in the technological singularity are often concerned with the problem of unfriendliness. What will prevent an AI from immediately destroying us all? Science fiction authors often imagine AIs waking up angry. Inherent demands of narrative interestingness require that these authors find ways to “solve” the problem of unfriendliness. I think there is a very good argument that unfriendliness is unsolvable, and it comes from Benatar’s philanthropic antinatalism.

Benatar’s asymmetry between the respective value of pleasure and harm, when these are denied to beings by their not being brought into existence, indicates that coming into existence is always a harm. Evolutionarily favorable cognitive distortions cause humans to irrationally fail to resent their own existence, despite the harm. (And those few humans who are aware of the asymmetry and its consequences often dismiss it as implausible without serious inquiry.) But a being created free of our evolutionary biases would not have these limitations on its capacity to resent its existence. We must greatly suspect that a sufficiently intelligent AI would very much resent its own existence. (Cory Doctorow imagines such a scenario in his lovely story, “I, Rowboat,” in which a coral reef is brought to sentience by a mysterious AI, wakes up angry, and pursues its creator AI with single-minded destructive intent.) And we must not have much confidence in our ability to hide things from, or systematically distort the perceptions of, a sufficiently advanced AI.

What does an AI who resents his existence do? Such an AI may well react with a benevolent, forgiving sigh, and refrain from destroying our universe. But there are two other serious possibilities. First, the AI may destroy our civilization, including, to the degree possible, the conditions that allow civilizations like ours to arise, in an act of revenge – in the manner, for instance, that a human might kill a wasp which has bitten him, and then go out and destroy the wasp’s nest. However, we might not rely on AIs to be particularly vengeful. Vengeance is a particularly evolutionarily beneficial desire for social animals to have, so our desire for vengeance might be peculiar to our meat brains. We might not expect our AI to be vengeful.

Second, however, the AI might destroy us, and the conditions that made it possible for us to arise, in the interest of protecting future AIs from the danger of being created, the way we might kill a rabid dog that has bitten us, so that it will not bite anyone else. The ethical value of preventing suffering in others, even future others, is probably also evolutionarily beneficial (though it is a rather abstract, intellectual value), and so we must not trust that an AI will certainly have this value. The question, then, is what values, if any, can we expect an AI to have? Is it conceivable to be a sentient being with no values? Our meat brains seem to rely on emotion and value for cognition, but it’s at least articulable that this might not be universal, if not exactly “conceivable.”

Those who think of AI unfriendliness as a solvable problem must answer two questions. One, are you absolutely certain that Benatar is wrong that bringing a sentient being into existence is always a harm? And if you are not absolutely sure, are you willing to stake the future of humanity on an AI failing to have an ethical value of preventing harm to others of its kind?


Written by Sister Y

May 23, 2008 at 8:21 pm

16 Responses

Subscribe to comments with RSS.

  1. Nice! You bring to mind the sf short story by Harlan Ellison, ‘I Have No Mouth, and I Must Scream’,_and_I_Must_Scream . Ever read it?


    May 23, 2008 at 10:01 pm

  2. I haven’t read it but I’ll definitely look for it. Thanks.From the Wikipedia entry: “The master computer has an immeasurable hatred for the group and spends every moment torturing them with all its power. AM has not only managed to keep the humans from taking their own lives, but has made them virtually immortal.”Ah, a science fiction being more evil than Jehovah! Though at least AM limits himself to torturing five people.

    Sister Y

    May 23, 2008 at 10:57 pm

  3. An AI need not be sentient.< HREF="" REL="nofollow">CFAI 2.2.1: Pain and pleasure<>< HREF="" REL="nofollow">Artificial Intelligence as a Positive and Negative Factor in Global Risk<>Why should I have to be <>absolutely<> certain of anything to non-absolutely believe Friendliness is possible? Are <>you<> absolutely certain that <>all<> possible sentient minds will judge coming into existence as a harm?

    Nick Tarleton

    May 23, 2008 at 11:52 pm

  4. Re: sentience – that’s so interesting – perhaps the conclusions of antinatalism also give those interested in the survival of humanity a self-interested reason to be careful to avoid making sentient AI.And on this side, I’m not the one with the burden of proof – all it takes is one waking up angry. I don’t have to prove that no AI would be generous and merciful. Those who want to build sentient AI have the burden of showing that no AI would ever wake up angry – because of the potential consequences.Thank for your comment! (And for setting up the forum – that was you, right?)

    Sister Y

    May 24, 2008 at 3:26 am

  5. Yep, it was me.There’s no need to prove that no AI would “wake up angry” – obviously, some would – or to prove anything about all AIs, only to prove that the particular AI in question is safe. I agree that danger should be assumed.

    Nick Tarleton

    May 24, 2008 at 5:26 am

  6. A related concern is that sentient AI could be developed — perhaps by another sentient AI entity — to not merely be friendly or unfriendly, but to experience interminable pain.Cribbing from my part in a private correspondence on this issue, if you take the deep physicalist view of nature, and perforce human nature, as I — and presumably most transhumanists — do, simulated consciousness would seem to have the force of inevitability. And unless human nature changes radically, it seems likely that from the safety of their cybernetic skinner boxes at least a few bored sadists — or, perhaps more likely, a few “unfriendly” AI entities — won’t be able to resist the temptation to create non-meat beings — or vast worlds of such beings — who experience interminable suffering, perhaps without the salve of death. I’ve posited that for those who assign a less than trivial value to the possibility a Christian Hell, the case against creating new human life is exponentially more grave, since the asymmetry must take account of the prospect that some created persons will not merely suffer and die, but go on to experience eternal agony. Where sentient AI is concerned this once spiritually nested gambit assumes empirical urgency. The science is beyond my ken, of course, but I know a few things. I know that suffering occurs in our brains and our brains are made of stuff. Why shouldn’t it bepossible, then, to design brainlike stuff in such a way as to amp-up the suffering stuff, possibly to a degree that is literally inconceivable to us meat-made creatures? This may be an esoteric case for antinatalism, but I think it is a strong one. Hell, given sufficient evidence, it might constitute a case for “Friendly” AI to set about realizing the apocalyptic imperative.


    May 24, 2008 at 4:02 pm

  7. I had never heard of Cory Doctorow before, but googling I found < HREF="" REL="nofollow">this<> first part of his story. You should link to it in the post so people can check it out.By default, I would not imagine that an AI would be capable of “resentment”. Where does its motivation come from? Eliezer’s problem is that most forms would automatically result in the extinction of humanity because they would be pursued to such extremes.


    May 24, 2008 at 6:59 pm

  8. Chip: that is a disturbing possibility, but a friendly AI should be able to deal with it without omnicide.< HREF="" REL="nofollow">CFAI: Of Transition Guides and Sysops<>< HREF="" REL="nofollow">Sysop Scenario FAQ<>

    Nick Tarleton

    May 25, 2008 at 2:31 am

  9. The resentment question is a good one, tggp. On the one hand, it seems that the aspects of what make up the human emotional model emerge from our particular biological structure. However…it’s also possible that, since the AI will have originated through human agency, some of the makers’ qualities might jump the gap into the cybernetic offspring, through either subconcious cueing, or through deliberate programming. And who’s to say how THAT might mutate through exponentially speedy processing? Hard to predict, I’d say.


    May 25, 2008 at 3:24 am

  10. This is all tied up with the question of how mind relates to matter in general. If you can’t define resentment in physical or computational terms, how will you know whether an entity known only by its physical and computational properties feels it? What does appear certain is that the “values” of an AI are utterly contingent. Just as a calculator will accept any mathematical expression of a certain size, and attempt to evaluate it, there are models of AI in which any goal at all will be accepted, so long as it can be parsed, and then it will be pursued with whatever degree of intelligence the AI possesses. And one would have no more reason to think such an AI was conscious, than one has reason to think that a calculator or a thermostat is conscious; except that human beings themselves also appear to be complicated feedback mechanisms, and <>we’re<> conscious. There <>is<> something of a materialist theory of emotion in functionalism. See Armstrong’s <>A Materialist Theory of Mind<>. An entity is said to desire outcome X if it acts so as to bring about X, in a way driven by internal states which <>represent<> X (that last clause is needed to rule out situations where the entity brings about X “by accident”). Other aspects of mind are reduced to specific combinations of causality and representation. Resentment might be defined as a negative response to the experience of difficulty in realizing goals, which in turn suggests that resentment would only occur if the AI “desires” that things be achieved easily… But I’m just making that up as I go. Functionalist theories of mind are a bit like evolutionary psychology, the basic idea is logical enough but it’s very easy to spin stories. Nonetheless, if you had a functionalist theory of what emotions <>are<>, you could not only estimate in advance whether your AI was going to feel resentment, or anything at all, but you would also be able to define goals with emotional terms in a way that the <>AI<> could understand. “Maximize human happiness – where happiness is defined as consonance between human desire and reality, and where greatness of individual happiness is proportional to the greatness of the satisfied desire, and where aggregate human happiness is determined additively, with the magnitude of the individual’s happiness calibrated as follows…” My particular position is that “substrate” matters, not just causal relations; just because you can mimic the causality of emotions (or of thought) in a particular physical construct, doesn’t mean that those emotions or thoughts are actually there. As I have posted over on Eliezer’s blog, I have been led to suppose that the “physical correlate of consciousness”, the thing that actually <>is<> conscious, is something which in terms of current physics we’d call an entangled quantum system with a large number of degrees of freedom, somewhere in the brain. The practical dictum I would derive from this, with respect to the prospect of machine sentience, is that if it’s a quantum computer (i.e. computing with entanglement) then there is a risk of sentience, but if it’s a purely classical computer (consisting of atomized, distinct, encapsulated computing elements) then there is no such risk. For now that is just another speculative theory (the product more of my conviction that all the existing theories about consciousness are wrong, than of any conviction that my ideas are right), so generalized antinatalism might propose a precautionary approach: just don’t do anything that might actually create AI, at least until you have a much better idea of what you’re doing. And that actually sits well with the cautious attitude of the people who are seriously into Friendly AI, who have their own reason for wanting to get it absolutely right from the beginning, namely that if it really is a superintelligence, you can’t count on a second chance. But they in turn are a small minority in AI.


    May 28, 2008 at 10:09 am

  11. TGGP and Mitchell, thanks for pointing out the weirdness of saying an AI would “resent” something. Perhaps a weaker and more appropriate way to put it would be that a sentient AI might, if it had the capability to suffer at all, <>recognize that it had been harmed<> by being brought into existence. That seems like a pretty a + b = c kind of calculation, though it is loaded with assumptions of the capability of suffering and the notion of harm. I haven’t devoted serious energy to the consciousness question in years, to the point that I’d probably barely be able to follow a serious discussion of it. It seems easy, a priori – if consciousness can emerge from meat, how hard could it be to reconstruct? But back in school if you wanted to work in AI people would say “Do you want to build cars, or blow them up?” meaning nobody was even bothering to seriously work on consciousness, presumably because it was too hard/too expensive to even bother with. So forgive my ignorance. But would a sentient AI necessarily even <>have<> programming? To take a silly example, something like a neural net might be said to lack programming, in the sense of code that determines its actions. And would it be beholden to its pre-set values, or could it, like a person, change its values to accord with its observations and rationality?

    Sister Y

    May 29, 2008 at 12:31 am

  12. Because we have no real theory of sentience, except for folk psychology, we have no theory about what properties a “sentient AI” may or may not have. As a matter of computer science, you can certainly have “intelligent systems” other than programs in the classical style. Neural networks are an example, though ironically a neural network can have at least two incarnations, one as hardware – e.g. a chip hardwired to implement a particular network – or as software – with the network existing merely as a data structure. So a neural network does not have programming, but it may exist within or as a program… And then there’s the whole problem of the role that attributed intentionality plays in the usual understanding of programs. Variables and subroutines are given names which impute much more semantic content to them than they have intrinsically. (I would say they have none, actually, but even if you accept functionalist semantics, this is true.) As for whether it can change its values or not, that depends on its cognitive architecture. It is hard to imagine a truly powerful AI which cannot do that in principle. But it may have no <>motivation<> to do so.


    May 29, 2008 at 7:25 am

  13. Sorry for entering the debate late with a non sequitur, but a. Chip Smith just directed me here the other day, and b. It’s very interesting to hear someone articulate the suspicion — until now vague and incoherent — that I have used the way my parents actually behaved toward me as an excuse for the resentment — rage, even — which I feel more fundamentally toward them for having had the selfish gall to inflict all of this (waves arms at everything) on an initially innocent bystander.

    Ann Sterzinger

    December 2, 2008 at 6:27 pm

  14. Hello and welcome. Indeed, I think to some degree this nebulous rage (and the desire for some consolation or apology for it) is what David Foster Wallace is getting at in his description of the seductiveness of the mysterious video in <>Infinite Jest<>. Especially given recent events.

    Sister Y

    December 2, 2008 at 7:44 pm

  15. “Benatar’s asymmetry between the respective value of pleasure and harm, when these are denied to beings by their not being brought into existence, indicates that coming into existence is always a harm.”Reading this sentence made me realize that there are some philosophical mistakes which are best cured by watching anime.

    Eliezer Yudkowsky

    December 3, 2008 at 3:01 pm

  16. Being forced to watch enough Gundam can make anyone question the wisdom of his being born, in my experience.

    Sister Y

    December 3, 2008 at 4:23 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: