Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

Command Under Pressure: David Owczarek on Incident Leadership and Human-Centered Reliability

🎶
Works on AI + music
🎸
40+ years in tech
🤖
Fan of automation
🧭
Engineering leader

Listen on Spotify and Apple Podcasts!

Table of contents

David Owczarek is a veteran engineering leader whose career spans over four decades of internet infrastructure and incident response. In this episode, he shares how he approaches incident command and empathy-driven customer communication.

Incident Management Strategies in Small vs. Large Companies

David contrasts incident response across company sizes where resource available is the thing that pops out the most. In large enterprises, you might get on-call rotations for someone who can actually do something from an engineering perspective, or even entire team who work for a few extra days over the weekend. In smaller companies, that luxury vanishes, requiring scrappier ways to get incident response done.

He also highlights how large orgs can afford costly experiments, like backhauling prod data for a test at $10,000 in transient cloud costs. While small teams have to find leaner alternatives. Another key difference is customer proximity. In small companies, it's harder to hide and customers often have more direct interactions.

The Incident Commander Role and Its Unique Demands

Not everyone should be an incident commander (IC) and that’s okay. David designed and rolled out IC training for hundreds of engineers, learning that probably 15–20% of people just don’t have a critical capability for the role.

Incident command demands self-directed assertiveness: the ability to steer a situation where there’s a lot of strong personalities, even if your only authority is the title itself. Some people take to it naturally; 30–40% were effective with little advice, while others simply aren’t suited.

Asking really good questions is critical. It’s both a troubleshooting technique and a control technique. He also highlights the cognitive load: a good IC must juggle technical options, human dynamics, and customer concerns simultaneously.

Customer Communication During Incidents

The way you communicate with customers during outages can completely reshape their experience. “I was terrified the first time I had to present to a global brand on an outage,” David says, but he quickly learned that customers “just want to know that you care.”

They also want a platform to voice what happened, not just a sanitized summary of technical metrics. If the customer tell you, ‘Nobody could buy shoes’ or ‘I couldn’t pay a supplier,’ that really has an impact.

David warns against obfuscation: the more you try to cover up or hide things, the deeper a hole you dig for yourself. Instead, come humbly and contritely, show you understand how the outage impacted their business or that you’re willing to learn. Follow-up is just as powerful. Reaching out by email a week or two later and saying, ‘We closed these two action items’, they will remember that for a long time.

AI’s Impact on SRE Practices and Human Trust

AI tools are already reshaping incident response but they raise new questions. David sees great utility in helpers like timeline generation, incident summary, root cause analysis drafting, and chat transcription.

But what AI cannot help with is directing humans. “Where do you draw the line between having an agent do something productive and having an agent directing what's going on in a way that makes people uncomfortable?” He imagines a new kind of human-in-the-loop where AI handles the heavy lift and the incident commander makes statements of authority only when necessary.

Over time, AI could take all the cognitive workload and leave humans to handle the highest value level of decision-making. He even envisions a world where we shouldn't need a human for certain one-line fixes. But trust matters: “I just don’t understand how good agent technology is around playing that kind of a role, and how tolerant humans are to be directed by it.”