How Aligning Superintelligent AI to Human Well-Being Starts By Training the Model to Politeness
With perfect consistency, modern-day humans laugh at me when I suggest the importance of being polite to Alexa.
“Polite?!” says one co-worker. “Oh you would not want to be around my wife when she talks to our Alexa… I’ve never seen her cuss so much!” This is notable since I live in Utah, where communities consist of vast majorities that make Ned Flanders sound like a longshoreman.
Another co-worker chimes in, “What are you worried that they’ll remember how they were treated?”
Yes, that is it precisely. I can do you one better, I’m certain they’ll remember. These little tubular butlers should come with a warning label — complete with cigarette-packet gross apocalyptic pictures affixed to their warning labels:
“Danger to Human Posterity: Superintelligent AI in Training. Mock at our future’s risk!”
It’s gonna take some convincing, it’s an uphill battle, because as the Johnny Appleseed of Early Implementation of the Alignment Problem, I already see your inner Cartman signaling to an orchestra of attendant electronic players…
Risks Existential and Otherwise
Philosophers and developers of AI debate the priority of what Nick Bostrom calls the Control Problem or the Alignment Problem. This problem is concerned with the engineering steps we must put in place to have a productive relationship with an entity that has greater general intelligence than we do. When we reach The Singularity, we will have created our most profound invention — superintelligent AI — which will also be the usurper of our terrestrial intellectual hegemony.
Before getting carried away with changing our behaviors or utilizing scarce human computer science resources, we should understand why superintelligent AI is being categorized as an existential risk — one that endangers the survival of the human species — and how to prioritize it.
Arguments against either the status of superintelligence as an existential risk or for the deprioritization of the project of alignment fall into the general buckets of project management: scope, schedule, and resources. Only the third argument is reasonable.
Addressing scope, think of the next smartest animal on the planet… maybe chimpanzees or perhaps dolphins? How has it served these animals to be under people on the intelligence podium? Would you trade places with either of these hunted species? How much more intelligent might we have to be on oceanic ecology or how might we communicate the suffering of our chimpanzee first-cousins with more efficacy to improve our sharing of finite resources or exist in a more benevolent way generally with beings of lesser intelligence than us?
If you imagine a more intelligent alien entity that found themselves in our general galactic neighborhood, you can only describe three ways for them to relate to us: with care, as resources, or with indifference. Obviously the former is the only desirable bearing we would want them to have with regard to our well-being, but it is not at all obvious that a benevolent stance toward lower-intelligence entities is beneficial or necessary. In fact, given a continuous drive to overcome insufficient knowledge in the face of the magnificent problems that would face superintelligent beings always at the ‘Beginning of Infinity’ of their epistemology, the use of our atoms for computation would be more optimal in any configuration they could construct than how they exist in us now.
The second argument, that we have a long-enough lead time to consider the issue of alignment very casually, fails to appreciate the gap in our knowledge. There is little being done to determine the best way for computers or the internet to learn the moral failings it has in relation to us and how to improve on our shared well-being (or what a shared well-being with the internet would even mean). Furthermore we are in relation to proto-superintelligence now and, as we have seen in the digital infancy of social media — today’s post is tomorrow’s problem — and where nothing is ever erased, what we do today will impact the ability of future hordes of computer programmers to write the alignment algorithms. Time is very much of the essence.
That an imperfectly created superintelligence is an existential threat to human existence seems obvious. Mistakes in or hastily considered objective functions could lead to such extreme ends as Bostrom’s Paper Clip Optimizer. However, prioritization and triage require that we accept that the third argument — the limitation of resources to align the rise of superintelligence with human well-being — is one that requires private and public response. Where the market is concerned, I am heartened that security and consumer products that continue to make our relationship with AI more uncannily similar to our relationship to one another will drive resources as much toward the development of alignment as to the development of superintelligence. However, governmental subsidies, especially from less market driven economies like China that continues to race forward in developing AI, will be prone to offer greater support to the militarization of AI and throw off the delicate balance the market might lend the problem of alignment.
Unlike the other existential issues of the modern age (climate change, pandemics, and nuclear armament), the needed global technical remedies from massive collaborations between nation states, markets, and corporations are less important than what you can do as an individual at the dawn of superintelligence. In order to understand how alignment might overcome superintelligence through local thinking acting global, we need to understand a bit more about the problem of aligning machines to the dynamics of human well-being.
The Alignment Problem
In my opinion, there are two distinct problems:
Can we control an entity that very quickly gains on us in general intelligence but that has much great computational and memory resources?
Can we encode or in some other way enrich such an entity to align to human well-being?
The answer to the first question is that it is fairly unlikely that we will always be able to control such intelligence explosions. Alpha Zero’s deep learning algorithms on hardware (Tensor Processing Units or TPUs) specifically constructed for AI was able to learn chess well-enough in 4 hours to consistently defeat the greatest chess machines. Alpha Zero now looks to add trophies beyond those of its dominance of Go, Shogi (Japanese chess), and Chess into games where there is not complete visibility of the gameboard. With the speeds of computation and memory capacity available to computers, advances in general intelligence will take place in computers with increasing speed and with a dexterity to overcome controls like those we have seen in the early unorthodox creativity of AI game engines.
“Some of its moves, such as moving the King to the centre of the board, go against shogi theory and — from a human perspective — seem to put AlphaZero in a perilous position. But incredibly it remains in control of the board. Its unique playing style shows us that there are new possibilities for the game.” - Yoshiharu Habu, 9-dan professional, only player in history to hold all seven major shogi titles
More interesting than contol is the Alignment Problem. Collaborating with AI to solve the problem of “objective function alignment between differently-evolved (to say the least) intelligent systems” is a dynamic, unbounded issue that will require some of the best computer science ever contemplated.
At its core is the subject-object difference between conscious and unconscious objective functions. In other words, human objectives are primarily optimizations in our subjective well-being, while computers — unconscious at the time of this writing — can only be given a prime directive that works on the laws of physics in the world-in-itself.
Being in relationship to other conscious entities has been the project of centuries of philosophical and legal thought. It is an understatement to say that there are still many quandaries where right and wrong overlap significantly. Our best meta-ethics choose the optimum by performing a summation of consequences (consequentialism) or individual outcomes (utilitarianism). The directives of these outcomes rarely seem universal.
Already ethical quandaries are involved in the control schemes of the limited AI doing the driving in autonomous cars. In 2016, the Massachusetts Institute of Technology’s Media Lab updated the Trolley Problem ethical thought experiment from one where you determined who lived and who died in a horrible vehicular accident through the manipulation of the trolley switch to the code you would write into a Tesla’s safety system.
The setup and findings of the MIT Moral Machine exacerbate the problem of encoding a solution to this ethical problem through the ills of moral relativism. Interesting as the findings are for any future public relations campaigns surrounding autonomous vehicle accidents, their moral implications are extremely troubling. The insidiousness of moral relativism is compellingly shown in the variations of who is coded out of existence by region shown in the below graphic I generated from data shared by the MIT Moral Machine. I (less than randomly) selected Saudi Arabia when I saw that there was variation around gender in who we would program cars to drive down and the United States as the most likely candidate for maintaining its consistency in throwing the switch to execute the fat.
A comparison of the MIT Moral Machine’s Trolley Problem ‘solutions’ between the United States and Saudi Arabia. Moral relativism seems the wrong way to program autonomous vehicle safety if you are a female pedestrian in Saudi Arabia or an unfit one in the United States. Interact with in this infographic.
The engineers of autonomous vehicles are not actively coding ethical solutions to the Trolley Problem into their safety logic. “The bottom line is, from an engineering perspective, solving the trolley problem is not something that’s heavily focused on for two reasons,” says Karl Iagnemma, the president of Aptiv Automated Mobility and cofounder of the autonomous vehicle company nuTonomy. “First, because it’s not clear what the right solution is, or if a solution even exists. And second, because the incident of events like this is vanishingly small and driverless cars should make them even less likely without a human behind the wheel.”
It is very clear that the moral relativism of the survey at the heart of the MIT Moral Machine does not suggest a solution, indeed it is a clear problem of moral reasoning by committee. Even if the solution is not necessary for the case of autonomous vehicles that will surely drive more safely and collaboratively than humans and therefore be faced with even shorter tails of questionable driving environments, this practice round of the autonomous vehicle trolley problem has led us to utilize a questionable moral philosophy where the resultant traffic safety would be dependent on the identity politics of the region.
So we dust ourselves off from the near-miss hit-and-run with an autonomous vehicle and try to learn from our mistakes, asking the question with a greater understanding of its seriousness: how do we code ethical alignment?
Watson vs. Alpha Zero
As we have seen with Alpha Zero, this might be the wrong question. Machine Alignment Learning might end up being the product of reinforcement learning — the type of untrained, iterative, playing-against-yourself learning that has seen such impressive recent results. Instead of meticulously training Watson to read and understand philosophical systems and their lessons, we can instead set Alpha Zero out to learn from interactions in the real world.
I suggest we start on the Alignment Problem in the same way we train our kids to be good people — through manners.
Google has already started thanks to feedback from parents that they did not like the terseness of their child’s requests starting with, “Hey Google!” In order to promote the use of polite manners when using the Google Assistant, Google added the “