Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

AI at the Frontlines of Healthcare Reliability with Ryan Lockard (CVS Health)

🧠
AI-first mindset
👨‍👧
Tech-savvy parent
🛠️
Improves legacy systems
🗣️
Prompting pro

Listen on Spotify and Apple Podcasts!

Table of contents

Ryan Lockard, VP of Platform Engineering and AI Enablement at CVS Health shares how AI is transforming reliability work across the healthcare tech stack—from helping teams troubleshoot legacy systems to enabling proactive engineering with LLMs and natural language interfaces. He unpacks the real-world impact of AI on incident response, developer productivity, and team culture—and why prompt engineering might be the next essential skill.

1. AI and LLMs in Incident Management and Legacy Systems

Ryan describes AI-driven incident response as the “near future” of reliability. Teams are already experimenting with SRE agents—autonomous bots capable of chasing anomalies, running RCAs, and even managing incidents end-to-end.

But in Healthcare, as in other industries, there is one particular use case that is specially interesting for Ryan: dealing with legacy systems. “You will inevitably bump up against a legacy system somewhere some consulting group built ages ago.” When incidents pop up, engineers can turn to LLMs and say, “Hey, I didn’t build this… I think it’s got something to do with networking. Can you help me out?” These models act like that consultant that has already left the company, but for free.

Ryan also discusses how the AI features that we’ve grown accustomed to already, in less than a year, are significant. For example, AI transcriptions and summaries during incidents, as they let executives catch up without throwing the gravitational weight of their title in the meetings to get updates.

2. AI is Unlocking Proactive Reliability

Ryan emphasizes that AI isn’t just for firefighting incidents, it’s enabling his team to take care of proactive reliability tasks that could not have been prioritized before. At CVS Health, the goal is to “unlock AI for developers everywhere,” including infrastructure engineers, platform teams, and SREs. That means giving them the ability to move faster and smarter even outside of crisis mode. Engineers now ask AI code assistants for things like, “Hey, help me write this script, help me optimize that module.”

One striking example: the team had to rewrite a performance testing framework and leaned on AI because it was written in a language the team wasn’t comfortable with. AI made the translation process accessible. And it’s not just functional code. AI is also documenting code very well. It’s drawing diagrams I developers never dawn. Ryan sees this as a quality improver as much as a productivity boost: “We're starting to see some gains,” he says, “and we’re just scratching the surface.”

3. Operational Efficiency Through Natural Language Interfaces and MCP

A major breakthrough, Ryan argues, lies in the natural language interface: “You could have an MCP server in front of your cloud infrastructure and then you can just ask it questions in a very natural way.” MCP—Model Context Protocol—is middleware that lets LLMs query infrastructure as if talking to a teammate.

Ryan explains, “You don’t have to remember what screens to go through.” Instead, engineers can just say, “Show me all of the events that are probably relevant in the past 24 hours for these two VMs.” That saves crucial time during incidents, but also improves day-to-day operational efficiency. “You recover quicker. You’ve gotten to the solution faster.” Especially in high-pressure environments like healthcare, that speed matters.

Sylvain compares the experience to “incident vibing”—navigating systems conversationally and fluidly, similarly to coding with Copilot.

4. Cultural Shifts and the Role of Ownership in Reliability

Ryan points out that the foundation of a reliability-first culture is ownership. The “easy answer,” he says, is making developers go on call—but the real shift comes when “the person that built the code is the one waking up in the middle of the night.” It changes the incentive structure. Instead of rebooting and punting the problem to another team, engineers fix it the next day because they’ll be the ones paged again.

But the “harder answer” is education: “You can be in peacetime or wartime with reliability,” Ryan explains. “It’s actually much easier to prioritize reliability in wartime. People will say, I’ll give you whatever funding you need, just fix this thing.” Peacetime is trickier, leaders must justify reliability investments when nothing seems broken. “What do you mean we’re approaching a limit? I don’t feel that way.” Building trust and understanding across stakeholders, especially non-technical ones, is essential.

5. AI’s Impact on Engineering Teams: From Senior Engineers to Juniors

AI tools are reshaping careers at every level. Senior engineers can now “fly in cruise” with models that respond to precise, well-framed prompts. “It’s a force multiplier,” Ryan says. “They can drain the backlog and that will be a problem if we follow outdated deployment patterns.”

But AI also empowers junior engineers: “I’ve got kids, I think about this a lot.” It’s an exciting time to enter the field. “Coding’s fun again. People feel that much closer to what they want to do.” Ryan calls prompt engineering “an underrated skill” that we’re not teaching enough yet, but believes it will become a core competency.

On concerns about “vibe coding,” Ryan is pragmatic: “People not reviewing code and shipping it to prod is a problem whether or not you’re using AI.” The solution? Focus on good practices like TDD, small batch deployment, and letting “your production synthetics be the best test.”