Incident Management Goes to the Olympics
A look at outages and disruptions to the IT systems that power the Olympics, from 1996 to today.
May 27, 2022
5 min read
Best practices for “SRE pioneers” – meaning engineers who are the very first SREs hired at an organization.
Site Reliability Engineers (SREs) have a considerable set of tasks to juggle no matter where they work or how long their company has had an SRE practice. But if you’re the very first SRE to join an organization – as many SREs are these days, given that the SRE trend is trickling down into smaller and smaller companies – you face a special group of challenges.
You may find it difficult to get buy-in for SRE from other technical teams. You may struggle to know how to convince your boss that the company’s decision to hire an SRE is paying off. You probably worry that you’ll end up being over-extended because you are on the only SRE at the place. And so on.
We put together this list of tips for SREs who find themselves in the position of being the “SRE pioneer” at their companies. Whether you’re an experienced SRE who has just been hired at a place that had no SREs previously, or you’re brand-new to SRE and your first job happens to be at a company that is also brand-new to the SRE role, the following best practices will help you make the transition as smooth as possible for all involved.
For starters, you’ll want to be prepared to justify the business’s investment in SRE as soon as a boss or fellow engineer asks.
Do this by collecting metrics about the technical and business impact of SREs starting as early as you can. Your goal should be to quantify how you – the SRE – are benefitting the organization, so that you can encourage more support for yourself and any other SREs who may be hired.
On top of this, collecting SRE metrics is also important for establishing early feedback about which of your SRE strategies are succeeding and which could be improved. Since you’re building an SRE practice from scratch, getting this insight early and often is critical.
As you start an SRE role, you may find some uncertainty among existing engineers about whose “side” you are on. Are you there to support developers, or IT engineers?
The answer, of course, is “both.” Although SREs tend to borrow from software engineers’ playbooks more than they do from those of IT Ops, an SRE’s goal should be to empower and support both of these teams.
So, unless you’re given explicit instructions to interface with only one group or the other, make yourself everybody’s buddy within the IT organization.
SREs spend most of their time doing technical things with technical tools. But ultimately, the only reason they are doing it, in most cases, is to support non-technical users by improving the reliability of the systems they use.
As the first SRE at a company, it may be well worth your time to get to know these non-technical stakeholders, and ask them what they see as the greatest reliability issues in the organization. Building these relationships early-on will help you communicate the value of SRE across the organization. It’s also likely to come in handy during incident management routines, when you may need to interface with non-technical stakeholders as well as other engineers.
Deciding which tools to use as an SRE is an obvious early step for any SRE who is building a practice from scratch. But we didn’t put it first on this list because it’s important to get to know the rest of the organization before committing to a particular set of tools.
So, as you meet the rest of the organization, ask detailed questions about which tools they use to address needs like observability, security and incident management. Then, by factoring in feedback about which tools work well and which don’t, choose which tools you’ll deploy to support SRE. Be prepared for those tools not to match your own personal preferences perfectly. You may need to prioritize the preferences of the rest of the organizations.
Unless you have been hired at a very small company, it’s likely that the business plans to hire more SREs over time (assuming that you prove the value of the position, of course).
Toward that end, plan for how the SRE team will expand. Think about things like whether to use embedded SREs or create a dedicated SRE team. Think, too, about how you’ll document your company’s SRE practice and ensure it keeps operating smoothly as individual engineers come and go.
The job of a newly hired SRE is not an easy one, especially if you’re the first person at your company to fill those shoes. But by being deliberate and strategic about how you work, which data you collect and how you introduce the company to SRE, you set yourself – and your future SRE colleagues – up for success.
{{subscribe-form}}