The View from Hell

Just another site

Archive for the ‘unfriendliness’ Category

Unfriendliness is Unsolvable

with 16 comments

The poverty of evolutionarily-generated human cognitive capacity is such that it “can only be grasped through much study of cognitive science, until the full horror begins to dawn upon you,” says Eliezer Yudkowsky, in his essay “My Childhood Role Model” on Overcoming Bias. In considering artificial intelligence, we should not assume that our meat brains establish anything like limits on the possibilities for intelligence. Indeed, Eliezer describes, in the form of a science fiction fable, a situation where even slightly-augmented humanity outsmarts a powerful alien civilization by extracting more information from observation than the aliens realize is possible.

Not only our capacity for cognition in endeavors such as science, but also our values – including the extent to which we can choose our values, and process by which we do so – are determined and limited by the meat hardware on which we run. Our capacity for evaluating ethical arguments, and the intuitions we base our ethics on, are in place largely by evolutionary chance.

Given the destruction a sufficiently powerful AI could do – and I think Eliezer’s “That Alien Message” may be taken as an argument that any sufficiently intelligent AI is by definition unpredictably powerful – those interested in the technological singularity are often concerned with the problem of unfriendliness. What will prevent an AI from immediately destroying us all? Science fiction authors often imagine AIs waking up angry. Inherent demands of narrative interestingness require that these authors find ways to “solve” the problem of unfriendliness. I think there is a very good argument that unfriendliness is unsolvable, and it comes from Benatar’s philanthropic antinatalism.

Benatar’s asymmetry between the respective value of pleasure and harm, when these are denied to beings by their not being brought into existence, indicates that coming into existence is always a harm. Evolutionarily favorable cognitive distortions cause humans to irrationally fail to resent their own existence, despite the harm. (And those few humans who are aware of the asymmetry and its consequences often dismiss it as implausible without serious inquiry.) But a being created free of our evolutionary biases would not have these limitations on its capacity to resent its existence. We must greatly suspect that a sufficiently intelligent AI would very much resent its own existence. (Cory Doctorow imagines such a scenario in his lovely story, “I, Rowboat,” in which a coral reef is brought to sentience by a mysterious AI, wakes up angry, and pursues its creator AI with single-minded destructive intent.) And we must not have much confidence in our ability to hide things from, or systematically distort the perceptions of, a sufficiently advanced AI.

What does an AI who resents his existence do? Such an AI may well react with a benevolent, forgiving sigh, and refrain from destroying our universe. But there are two other serious possibilities. First, the AI may destroy our civilization, including, to the degree possible, the conditions that allow civilizations like ours to arise, in an act of revenge – in the manner, for instance, that a human might kill a wasp which has bitten him, and then go out and destroy the wasp’s nest. However, we might not rely on AIs to be particularly vengeful. Vengeance is a particularly evolutionarily beneficial desire for social animals to have, so our desire for vengeance might be peculiar to our meat brains. We might not expect our AI to be vengeful.

Second, however, the AI might destroy us, and the conditions that made it possible for us to arise, in the interest of protecting future AIs from the danger of being created, the way we might kill a rabid dog that has bitten us, so that it will not bite anyone else. The ethical value of preventing suffering in others, even future others, is probably also evolutionarily beneficial (though it is a rather abstract, intellectual value), and so we must not trust that an AI will certainly have this value. The question, then, is what values, if any, can we expect an AI to have? Is it conceivable to be a sentient being with no values? Our meat brains seem to rely on emotion and value for cognition, but it’s at least articulable that this might not be universal, if not exactly “conceivable.”

Those who think of AI unfriendliness as a solvable problem must answer two questions. One, are you absolutely certain that Benatar is wrong that bringing a sentient being into existence is always a harm? And if you are not absolutely sure, are you willing to stake the future of humanity on an AI failing to have an ethical value of preventing harm to others of its kind?


Written by Sister Y

May 23, 2008 at 8:21 pm