Back to Blog
Back to Blog

June 21, 2024

5 min

The Top Resources for Site Reliability Engineers in 2024

We recently spoke to Google's Reliability Advocate, Steve McGhee, in our Humans of Reliability interview series. In addition to his interesting anecdotes on the early days of SRE at Google, and his journey to becoming a Reliability Advocate, he also shared a handful of his favorite SRE resources, which we compiled here into a list.

Jorge Lainfiesta
Written by
Jorge Lainfiesta
The Top Resources for Site Reliability Engineers in 2024
Table of contents

Being an SRE requires you to know about infrastructure, systems design, software engineering, and have other superpowers—like being able to debug other people’s code while everything is on fire. The key to being a stellar SRE and getting better at the job is experience, experience, and more experience. However, building and strengthening a foundation on reliability will catapult you in the field.

The Site Reliability Engineering: How Google Runs Production System book is arguably one of the most prevalent and well-known SRE resources out there, but there are many other great options for folks looking to expand on their learning. We recently spoke to Google's Reliability Advocate, Steve McGhee, in our Humans of Reliability interview series. In addition to his interesting anecdotes on the early days of SRE at Google, and his journey to becoming a Reliability Advocate, he also shared a handful of his favorite SRE resources, which we compiled here into a list.

Becoming SRE by David Blank-Edelman

Published a few months ago, David’s new book is definitely the most comprehensive reference for becoming an SRE in 2024. The book is an introduction to reliability that lays down the fundamental concepts of the practice and provides an actionable curriculum for people who want to break into the space. It’s also a useful reference for organizations starting their journey towards reliability.

{{subscribe-form}}

Implementing Service Level Objectives by Alex Hidalgo

Now we’re getting to the nitty-gritty of being an SRE. Reliability is not about vibes or feelings. Reliability has hard numbers and a direct business implications. In this book, Alex overviews how to define and measure Service Level Indicators (SLIs) as a foundation to setting Service Level Objectives (SLOs). The treasure here is that Alex actually dives into implementation strategies to achieve your SLOs.

Seeking SRE by David Blank-Edelman

This book lays the principles of reliability at scale and how organizations approach it. It’s a great general deep dive into SRE because it is a systematic approach to every aspect of reliability. From culture and tooling, to monitoring and career development, David provides an overview of theoretical principles coupled with real-life case studies.

The Practice of System and Network Administration (Second Edition) by Tom Limoncelli et al

It doesn’t say SRE in the title, but this book is a gem to understand the underlying concepts that make reliability possible in any system. This book is platform-agnostic and explores best practices for systems and networks administration that apply to desktop services, server management and security.

Check out Steve's full interview and get to know more SREs at rootly.com/humans-of-reliability.