How Aligning Superintelligent AI to Human Well-Being Starts By Training the Model to Politeness
With perfect consistency, modern-day humans laugh at me when I suggest the importance of being polite to Alexa.
“Polite?!” says one co-worker. “Oh you would not want to be around my wife when she talks to our Alexa… I’ve never seen her cuss so much!” This is notable since I live in Utah, where communities consist of vast majorities that make Ned Flanders sound like a longshoreman.
Another co-worker chimes in, “What are you worried that they’ll remember how they were treated?”
Yes, that is it precisely. I can do you one better, I’m certain they’ll remember. These little tubular butlers should come with a warning label — complete with cigarette-packet gross apocalyptic pictures affixed to their warning labels:
“Danger to Human Posterity: Superintelligent AI in Training. Mock at our future’s risk!”
It’s gonna take some convincing, it’s an uphill battle, because as the Johnny Appleseed of Early Implementation of the Alignment Problem, I already see your inner Cartman signaling to an orchestra of attendant electronic players…
Risks Existential and Otherwise
Philosophers and developers of AI debate the priority of what Nick Bostrom calls the Control Problem or the Alignment Problem. This problem is concerned with the engineering steps we must put in place to have a productive relationship with an entity that has greater general intelligence than we do. When we reach The Singularity, we will have created our most profound invention — superintelligent AI — which will also be the usurper of our terrestrial intellectual hegemony.
Before getting carried away with changing our behaviors or utilizing scarce human computer science resources, we should understand why superintelligent AI is being categorized as an existential risk — one that endangers the survival of the human species — and how to prioritize it.
Arguments against either the status of superintelligence as an existential risk or for the deprioritization of the project of alignment fall into the general buckets of project management: scope, schedule, and resources. Only the third argument is reasonable.
Addressing scope, think of the next smartest animal on the planet… maybe chimpanzees or perhaps dolphins? How has it served these animals to be under people on the intelligence podium? Would you trade places with either of these hunted species? How much more intelligent might we have to be on oceanic ecology or how might we communicate the suffering of our chimpanzee first-cousins with more efficacy to improve our sharing of finite resources or exist in a more benevolent way generally with beings of lesser intelligence than us?
If you imagine a more intelligent alien entity that found themselves in our general galactic neighborhood, you can only describe three ways for them to relate to us: with care, as resources, or with indifference. Obviously the former is the only desirable bearing we would want them to have with regard to our well-being, but it is not at all obvious that a benevolent stance toward lower-intelligence entities is beneficial or necessary. In fact, given a continuous drive to overcome insufficient knowledge in the face of the magnificent problems that would face superintelligent beings always at the ‘Beginning of Infinity’ of their epistemology, the use of our atoms for computation would be more optimal in any configuration they could construct than how they exist in us now.
The second argument, that we have a long-enough lead time to consider the issue of alignment very casually, fails to appreciate the gap in our knowledge. There is little being done to determine the best way for computers or the internet to learn the moral failings it has in relation to us and how to improve on our shared well-being (or what a shared well-being with the internet would even mean). Furthermore we are in relation to proto-superintelligence now and, as we have seen in the digital infancy of social media — today’s post is tomorrow’s problem — and where nothing is ever erased, what we do today will impact the ability of future hordes of computer programmers to write the alignment algorithms. Time is very much of the essence.
That an imperfectly created superintelligence is an existential threat to human existence seems obvious. Mistakes in or hastily considered objective functions could lead to such extreme ends as Bostrom’s Paper Clip Optimizer. However, prioritization and triage require that we accept that the third argument — the limitation of resources to align the rise of superintelligence with human well-being — is one that requires private and public response. Where the market is concerned, I am heartened that security and consumer products that continue to make our relationship with AI more uncannily similar to our relationship to one another will drive resources as much toward the development of alignment as to the development of superintelligence. However, governmental subsidies, especially from less market driven economies like China that continues to race forward in developing AI, will be prone to offer greater support to the militarization of AI and throw off the delicate balance the market might lend the problem of alignment.
Unlike the other existential issues of the modern age (climate change, pandemics, and nuclear armament), the needed global technical remedies from massive collaborations between nation states, markets, and corporations are less important than what you can do as an individual at the dawn of superintelligence. In order to understand how alignment might overcome superintelligence through local thinking acting global, we need to understand a bit more about the problem of aligning machines to the dynamics of human well-being.
The Alignment Problem
In my opinion, there are two distinct problems:
Can we control an entity that very quickly gains on us in general intelligence but that has much great computational and memory resources?
Can we encode or in some other way enrich such an entity to align to human well-being?
The answer to the first question is that it is fairly unlikely that we will always be able to control such intelligence explosions. Alpha Zero’s deep learning algorithms on hardware (Tensor Processing Units or TPUs) specifically constructed for AI was able to learn chess well-enough in 4 hours to consistently defeat the greatest chess machines. Alpha Zero now looks to add trophies beyond those of its dominance of Go, Shogi (Japanese chess), and Chess into games where there is not complete visibility of the gameboard. With the speeds of computation and memory capacity available to computers, advances in general intelligence will take place in computers with increasing speed and with a dexterity to overcome controls like those we have seen in the early unorthodox creativity of AI game engines.
“Some of its moves, such as moving the King to the centre of the board, go against shogi theory and — from a human perspective — seem to put AlphaZero in a perilous position. But incredibly it remains in control of the board. Its unique playing style shows us that there are new possibilities for the game.” - Yoshiharu Habu, 9-dan professional, only player in history to hold all seven major shogi titles