How to Improve Upon Google’s Four Golden Signals of Monitoring
The Four Golden Signals of monitoring and observability get a lot of things right. But they could be even better.
October 25, 2024
6 mins
With insights into AI and platform engineering, DORA’s 2024 report digs into what’s actually helping (and hurting) engineering organizations. In this blog post, you’ll learn how the findings may apply to SRE teams.
The DORA (DevOps Research and Assessment) team has been studying how tech practices impact organizational efficiency since 2014.
In the State of DevOps 2024, they set out to study the two main trends that have permeated our industry over the past few years: AI and Platform Engineering. Hyped by vendors and consultants, these trends are now put to the test through the rigorous methodology that has always characterized DORA.
The findings regarding organizational throughput (DORA's bread and butter) remain pretty consistent with previous years, so I won’t get too much into them here. If you're not familiar with Accelerate by Forsgren et al., check it out—it’ll help you better understand the yearly State of DevOps reports.
In this blog post, I want to highlight their findings on these trends and how they impact SRE teams.
Think about the last decade in tech. Each hype cycle has promised us a path to paradise, yet here we remain, dealing with miles of traces at 3 a.m. AI is no different (no surprise there, despite the collective utopic/dystopic dreams when ChatGPT and Dall-E went mainstream).
The 2024 DORA report takes a deep dive into how firms are using AI and the actual impact it’s achieving. Here are some interesting insights from the report:
No. AI is an evolving technology. As the hype dies down, more impactful applications are emerging. The DORA report highlights promising AI use cases that are not yet common practice but have shown success in organizations using them.
Here are some interesting applications for SRE tasks:
@rootly hey is customer XYZ affected?
and get immediate results.If you’ve been to a recent KubeCon, you’ve probably seen every booth plastered with the words “platform engineering” these days. But what does that really mean? The CNCF Platforms Working Group defines a platform as “a collection of capabilities, documentation, and tools that support developing, deploying, operating, and/or managing the delivery of products and services.”
This definition doesn’t help much, to be honest. Having worked with various platform-building teams, I’d say this is because one platform can look very different from the next. It’s definitely not a simple practice. But what it promises is a ticket to fulfilling all your engineering dreams and beyond. So, is it delivering?
Surprisingly—or maybe not—it is driving more tangible results than AI. Here are some interesting insights:
The premise of a platform is to standardize processes through golden paths and consolidated practices. Ideally, this would make SRE teams feel at ease, as there would be less fragmentation across services and more guardrails, so fewer bugs and issues hit production.
However, the DORA researchers found that change stability dropped by 14%, meaning failure rates and rework increased significantly when using a platform. A consequence of this dynamic is that teams dealing with frustration caused by platforms are more likely to experience burnout.
Why is this happening? A hypothesis is that using a platform leads people to ship more confidently—even if that confidence is unfounded. There’s more to consider here, but that’s beyond the scope of this post.
The past few years in tech have been tumultuous, to say the least. One day your team is focused on building a cool new feature; the next day, the entire product is sunsetted, and everyone is reassigned or laid off. This has led to high-stress working conditions and a lack of psychological safety for employees.
Although organizational efficiency has only marginally declined this year, on average, levels of burnout across organizations seem to be increasing.
Burnout is a multifaceted phenomenon with “physical, emotional, and psychological dimensions,” as well as a tangible impact on personal life. This complexity makes burnout difficult to study—and even more so, to mitigate.
According to the DORA team, burnout is characterized by “feelings of cynicism, detachment, and a lack of accomplishment.” Imagine going into an incident on a Sunday morning with this mindset. Not only will you struggle to solve the issue efficiently, but you also won’t care if you succeed or not. This only compounds the negative feelings the next day when you face work again—now featuring angry faces and threats.
The DORA researchers found that burnout is significantly higher in organizations with unstable priorities. As corporate researchers, they sought an organizational feature to mitigate burnout under these conditions—but they didn’t find one. No matter how strong your leaders are, or whether you have a great platform or AI, you cannot “counteract the effect of shifting priorities on burnout.”
Furthermore, the DORA study also rules out that “focusing on AI” is a strong indicator of reduced burnout. Even with an ideal vision the organization is moving toward, if shifting priorities continue within that space, software delivery will be negatively impacted.
Industry research like the State of DevOps is really valuable to help you challenge your assumptions. It can help you see more nuance on the practices you’re applying, or considering, for your organization. But the industry findings do not have to necessarily apply to the way you and your team work.