
Site Reliability Engineering: How Google Runs Production Systems
by Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy
Beyer and colleagues compile essays from Google's SRE organisation explaining how the company runs planet-scale systems through error budgets, service level objectives, and a deliberate blend of software engineering and operations. The editors argue that reliability is a first-class engineering problem addressed with automation, measurement, and blameless postmortems rather than heroics.
- Published:
- Pages:
- 550














