“Like the sorcerer’s apprentice, we will find ourselves just one set of agents among many, in a world crowded- as it were with brooms.”

Brian Christian in The Alignment Problem


This is the latest work by Brian Christian, an erudite technology author who wrote two of my favorite books on artificial intelligence: The Most Human Human and Algorithms to Live By.

This book focuses on the problem of systems not performing what we need them to do, called the “alignment problem”. Christian brings us, with this book, another masterpiece to complete his trilogy of AI books with abundant substance and wondrous insight. The main premise of the book is a cogent analysis of how machines misalign with humans, especially in this era of exponentially increasing data and extraordinary computational power as complex AI systems threaten to replace human judgment.

An example of this alignment problem is how computer programs involved in parole decisions have inherent racial biases embedded in the algorithms. These alignment problems exist partly because there is the notion that society can be “more consistent, more accurate, more fair by replacing idiosyncratic human judgment with numerical models.” This lofty expectation is perhaps unrealistic and we humans may in fact be the ones to blame for these misaligned findings.

Christian comfortably expounds on AI tools and principles and is concomitantly the consummate storyteller of the history of AI. The book starts off strong with the prologue and introduction, followed by three sections that details the main AI methodologies titled Prophecy (supervised and unsupervised learning), Agency (reinforcement learning), and Normativity (deep reinforcement learning). Each of of these three large sections (each with three chapters) focuses on various aspects of AI, but there is quite a bit of content on basic tenets of AI as well as AI safety and misuse (such as the “paper clip maximizer”).

The book crescendos in the latter third with its most convincing arguments for the necessary attention to AI and its escalating threat to human values. He argues that we need a strategy called early stoppage that can be used to reduce the AI propensity to optimize too far. I especially like the Conclusion section of the book which is considerably longer than the typical epilogue, but this section very nicely frames the future while reviewing the previous sections and really brings these sections together.

The book also has over 100 of its last pages on notes and bibliography as there are many interdisciplinary voices in the book, but most of it is very worthwhile reading and the content considerably enriches the sections.

I would highly recommend this book as a “must read” as we head into some uncertainty in AI. This book is timely and adequately prepares everyone for this upcoming era to safeguard humanity from the potential misalignments of AI.