Building On-Call Schedules for Humans Copy

Learn how to navigate vacations, parenthood and personal preferences to improve your reliability practice.

Alexandra Chaplin
Written by
Alexandra Chaplin
Building On-Call Schedules for Humans Copy

Last updated:

November 17, 2025

Table of contents

Service degradations, third-party failures, and malicious attacks don't wait for work hours—they often strike while you're having your best night. When chaos breaks and stress levels skyrocket, waiting until the next workday is not an option. This urgency is why many organizations ensure they have a team ready to respond around the clock, no matter when an incident strikes.

To manage this continuous readiness, companies implement on-call rotation schedules. Team members like Bob and Maya might be on call, for example, the first week of every month, ready to spring into action if an alert sounds. Their responsibility is to promptly acknowledge the alert, assess its urgency, and determine the severity of the incident. While they often resolve issues independently, complex situations might require escalating the problem and collaborating with others.

Incidents aren't just disruptions; they can dominate your focus, becoming worlds unto themselves. Organizations thus begin to measure the maturity of their reliability practices by metrics such as Mean Time to Recovery (MTTR) and DORA indicators. It’s only fair: as you scale up, you need abstractions to navigate the complexity.

Yet, amid these metrics, the human aspect of incident management sometimes gets overlooked. When systems fail, and downtime ticks into costly hours, it's the on-call people who become the unsung heroes, stepping in to mitigate disasters. Their role is pivotal, yet it's essential to recognize that they are more than just problem solvers—they're people with lives outside of work.

Acknowledging the human side means understanding that team members have personal lives filled with well-deserved holidays, loved ones who get sick, and surprise broken fridges that become little urgencies. Effective incident response strategies must consider these factors to maintain a resilient and responsive team.

This guide aims to explore what "on-call for humans" really means, emphasizing why it's crucial for both the organization and its employees. We'll discuss how to create rotation schedules that are equitable and sensitive to personal needs, ensuring that all team members, including parents and new hires, feel supported and valued. This approach not only helps in retaining talent but also in fostering a work environment where everyone is empowered to perform at their best.

On-call basics

The objective of on-call scheduling is to assign shifts to staff members such that the workload is distributed fairly. You'll need to decide how you split up the time in which people stay on-call and how often they do. On-call scheduling is built around three foundational components: organizing rotations, defining escalation policies, and maintaining schedules in your tool of choice.

Rotations

Incidents can pop up at any time, thus you must plan to always have at least one person on-call. Depending on the complexity of your system, you’ll need more than one person per shift.

You can’t have only a few people be on-call all the time. Not only because it’s unfair, but because it introduces a single point of failure into your strategy. Being able to respond to incidents is a specific skill that you want to cultivate in your team. That’s why you need to organize rotations to distribute the on-call duty in different shifts among your team.

Traditionally, SRE teams handled all on-call duties, bringing systems back online as needed. However, with the rise of the "you build it, you run it," mantra,  the responsibility is increasingly being delegated to the teams that own and operate specific components.

Organizing rotations by teams can add some additional admin work, but distributing on-call load by components can be useful in resolving incidents faster because it's tackled by the people who built it and maintain it.

There are a handful of best practices scheduling templates, such as biweekly or "follow-the-sun" rotations, used in the industry to assign shifts to people. In general though, you'll want to first ask each team member about their preferences and needs before organizing rotations.

Escalation policies

Let’s say you have Anton on-call right now, but he’s not answering his phone when an alert pops up. Who else should you contact? A colleague, a manager? What if the incident has a critical severity, who should be notified?

Escalation policies act as roadmaps to make sure the right people are notified at the right time when an incident breaks. An escalation policy can consist of several layers, to ensure enough redundancy in the scheduling system.