
The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Brian Christian traces the history and cutting edge of efforts to build AI systems that reliably reflect human values, drawing on hundreds of interviews with researchers in machine learning, cognitive science, and philosophy. Organised into three sections on representation, behavior, and normativity, the book reveals how bias in training data, misspecified reward functions, and the gap between optimization targets and human intent create systems that diverge from their creators' goals.
- Published:
- Pages:
- 496



















