Beyond MTTR: 7 incident metrics that matter and 3 that don’t
Measure what matters, not what is easier. Learn tips to untangle the different common metrics used by SREs.
August 6, 2024
6 min
Learning expert Sorrel digs into how stress inhibits our ability to learn, and what we can do about it.
While retrospectives provide a valuable pathway for learning outside of the flow of work, we also want learning to happen during an incident or unexpected event as it unfolds. This can be challenging due to the negative impact of stress on our ability to learn and navigate difficult situations.
In this article, we’ll dig into how stress inhibits our ability to learn and what we can do about it.
When we experience a stressful event, the amygdala, an area of the brain that contributes to emotional processing, sends a distress signal to the hypothalamus. This area of the brain functions like a command centre, communicating with the rest of the body to energize our fight-or-flight response (LeWine, 2024). When this happens the part of the brain that is responsible for reasoning shuts down.
Those familiar with the work of Daniel Kahneman (2012) might recognize this as a System 1 takeover of System 2 (those not familiar can refer to this summary by Loo (2024). Essentially, the brain’s fast, automatic response system is preventing us from entering the slow, effortful, and logical thinking mode required to solve more complicated problems. This temporary loss of cognitive control can be bad news in situations where there isn’t a clearcut solution or procedure to follow and we need to harness System 2’s learning capabilities.
Stress and anxiety can also provoke a strong desire to control aspects of our environment or other people in order to reestablish a sense of order. Again, that's all very well when the source of the problem or disorder is easily identifiable and we have the authority and capability to act appropriately, but it's not always possible where the problem space is complex or chaotic. In those situations, the need for order can lead us to control the wrong things for the wrong reasons.
It is especially important that those in a position of authority are able to manage this compulsion for control, due to the potential blocking effect it can have on others’ learning. That’s not to say leaders should never take decisive action in a crisis, just that it is important they are able to manage their stress responses in order to make better decisions about how to act. In this way they can not only achieve better outcomes, but are also modelling much healthier learning behaviours to others - thus turning a negative event into one that is positively reinforcing.
Fortunately there are many things we can do to get better at managing stress in difficult situations and ensure people are able to learn effectively. Here are 8 to get you started.
One of the most valuable things we can all do to support learning is to work at creating an environment in which people feel safe to report mistakes and incidents and to share ideas and feedback, without fear of blame or humiliation. This willingness to engage in interpersonal risk-taking is what Edmondson (1999) refers to as psychological safety and it is recognized by many as a cornerstone of high performing teams. In her recent book “Right Kind of Wrong” Edmondson (2023) invites us to explore our relationship with failure and offers practical guidance on how to “think, discuss and practice failure wisely.”
Another important way to reduce stress and cognitive load during incidents is to make sure you have well understood procedures and strategies for responding to incidents. Practising and reviewing those procedures should be something that occurs routinely - from onboarding and beyond. Incident response drills (Soff, 2023) and game days (Sawatsky, 2023) are some of the ways you can help prepare your teams for larger emergencies while fostering shared ownership and psychological safety.
This is mentioned here as an extension of 2 (making sure people are adequately trained in incident response). Chaos engineering (Rosenthal and Jones, 2020) is a disciplined approach to identifying weaknesses in a system by intentionally injecting faults into them in order to test their resilience. Besides the more obvious benefit of preventing disruptions from occurring in the first place, chaos engineering also serves to strengthen important learning behaviours like open communication and experimentation which can then be leveraged in a real crisis.
In complex systems, some accidents and failures are inevitable. Surfacing threats and failures quickly enables us to build resilience into the system and is therefore something to be celebrated. Viewing incidents as a form of feedback from which we can learn may help take the fear out of dealing with them and open us up to learning. This all starts with changing how we communicate about incidents.
Depending on the nature of the problem and/or system we’re dealing with, we need different problem-solving strategies. In order to recognize the problem-space, people may benefit from learning something about complexity theory and its applications in software development and organizational design (e.g. Minudel, L., 2021 ). Tools like the Cynefin framework (Kurtz and Snowden, 2003) can help here, as might a fresh look at software design principles through the lens of complexity (e.g. Ousterhout, 2018). However, choosing an appropriate learning pathway will likely depend on your role and context. Another thing worth considering is how your own learning preferences might ‘colour’ your perceptions of the problem-space and influence how you respond (Harriet, 2024).
Learning is a capability, and, as such, we need to work at developing the skills associated with effective autonomous learning such as feedback and reflective practice. In many professions, making time for daily reflection on practice and the feelings we encounter in the process has been recognized as an effective strategy for increasing our skills and performance (Thejll-Madsen, 2018). For those new to reflective practice, reflective frameworks can help you structure your reflections (e.g. University of Edinburgh, 2022). Keeping a reflective journal is another popular recommendation which can not only help with the process of learning and sensemaking but may help with stress management too.
Another valuable skill we can associate with learning is that of self-awareness. In the context of incidents, self-awareness grants us the ability to recognize our stress responses so they don’t impede our ability to solve the problem. For example, a natural response in a stressful situation can be to seek solitude, which may be the wrong course of action where communication and collaboration are needed. Another common stress response touched upon earlier is the need to exercise control which can leak into our behaviours in all sorts of ways. There are many tools and techniques we can use to build self-awareness. Some popular ones include the Personal SWOT Analysis (Mind Tools, 2024) and Ellis’s (1991) ABC model (summarized in Pilat and Krastev, 2024).
As an extension of this, we can also learn to recognize the signs of stress in others. This application of empathy can support learning in stressful situations. For example, if I am able to recognize John’s bad-temper as a symptom of stress, I may be quicker to step in and offer my support than if I assume it’s because John doesn't like working with me.
A natural progression from 7 then, is the ability to regulate unwanted emotions. Again, there are many options we can explore here, and the strategies that work for one person may not work for another. Mindfulness is a clinically proven technique that many find effective for reducing stress and anxiety (Williams and Penman, 2011), while cognitive reframing (or cognitive restructuring) is another set of techniques that focus on noticing and changing how we perceive negative situations (Ackerman, 2018). While these techniques are often practised in a therapeutic setting, many can be practised independently or with a qualified coach. Experiment to find out what works for you.
Maintaining our capacity for learning in stressful situations is a key ingredient to becoming “fail-fit”, but it isn’t something we can learn to do overnight. It takes repeated and deliberate practice to ensure our various learning muscles stay strong. An important part of this involves learning to recognize and manage our stress responses so they don’t become a blocker to our or others' learning.
We’ve shared just a few examples here of the many ways we can help to build and maintain our learning muscles, and the resources below offer further food for thought. Let us know if you’d like further advice or practical support for managing incidents or fostering learning.