How GRAIL replaced a manual, ad hoc incident process with Rootly and cut manual effort by 80%.

x

"We keep labs running, so reliability and clean, auditable records aren't optional for us, they're regulated. Rootly is easy to manage, and gives us a repeatable process we can stand behind." -Danny Liu, Security Incident Commander

Danny Liu

,

Security Incident Commander

GRAIL is a cancer-detection company working to make early detection something people can access as part of a routine checkup.

Founded: 2016 in Menlo Park, California, USA (a spin-out from Illumina)

Size: ~1000 employees

Rootly’s Impact

80%

less manual effort across incident response

Minutes

to root cause analysis (RCA)

99.999%

reliability

At GRAIL, the labs that turn a blood sample into a cancer-detection result depend on infrastructure the IT team keeps running, so reliability is not an abstraction here. Yet when Danny Liu was hired to mature the incident process, what he inherited was barebone; no incident commander, no defined roles, no real RCA, just people pinging Slack for help and then rebuilding the timeline by hand afterward, if anyone did. He replaced the chaos with structure on Rootly and cut the manual effort around incidents by roughly 80 percent. His sharpest point, though, is not about automation at all.

"Rootly’s real value is the guard rails that puts the right roles and the right information in place from minute one. Everything else is icing." -Danny Liu, Security Incident Commander

From manual to automated

The post-incident side was no better. There was no real RCA or structured process, so everything had to be reconstructed by hand; the timeline, who was involved, which product broke, the impact statement. It was a lot of manual work, and it was disorganized.

The biggest gap wasn't a tool. It was the absence of a process. Danny puts the split at roughly thirty percent process, seventy percent tooling. People didn't know how to start an incident, or even that they should. So they pinged back and forth, didn't always treat it seriously, and momentum was lost before anyone took control. For an IT team in a regulated, life-adjacent environment, where the goal is a five-nines infrastructure and a mean time to resolution measured in minutes, that informality was the real risk.

Structure, from the first incident

Adoption at GRAIL had an unusual origin: leadership brought it in. A senior IT director, Alex Derafshan, saw Rootly’s awesome AI potential and championed it, so Danny didn't have to fight the internal sales battle that usually comes first with process and tooling changes of this magnitude. As he put it, the money was the easiest part; his job was to connect the plumbing and clear the internal reviews. End to end, that took a few short weeks.

The value, though, didn't wait for full rollout. It hit on the very first incident. Rootly read the Slack activity and produced an RCA that was roughly eighty percent complete the moment the incident closed, leaving Danny and the team to fill in the last twenty percent. The timeline he used to build by hand, the chain of events, who did what and when, was simply there. Work that used to consume hours, and more than half of his RCA time, was gone.

"Rootly reads everything in Slack and gives me an RCA that’s done before I touch it. Across the board, the manual effort dropped by around eighty percent." -Danny Liu, Security Incident Commander

What changed

Guard rails, not just automation

Danny is precise about where the value sits, and it's a useful corrective to the usual feature talk. The biggest win, he says, isn't the automation. It's the guard rails. Rootly imposes a process; the incident starts on a command, a severity level is set from the beginning, and the framework forces the team to get the right information in within the first few minutes. It makes people think about how to approach the incident from the start, rather than improvising. The automation, the workflows, the notifications, all of that, in his words, is icing on the cake. The cake is the discipline.

Roles that hold under pressure

The structure shows up most clearly in roles. Rootly assigns an incident commander, uses an AI scribe, and a tech lead at the outset, so everyone knows their job. The tech lead focuses on the fix, the scribe tracks everything and drafts communication, and the commander tracks the whole thing, keeps people informed, and prioritizes. Where an incident used to be a scramble of Slack messages, it's now calm and organized, with clear ownership from the first minute.

RCAs that write themselves

The post-incident process is where GRAIL gets the clearest return. Rootly synthesizes everything that happened in Slack, and across connected meetings, into a structured RCA. That's the engine behind the roughly eighty percent reduction in manual effort Danny cites, and it means the learning actually gets captured every time instead of depending on someone remembering to write it up.

Auditable by default, which a regulated lab needs

None of this is a nice-to-have for GRAIL. Operating in a heavily regulated environment, with documentation, data-retention, and patching obligations and a five-nines target, the team needs incident response that is repeatable and produces a clean, complete record every time. Rootly's structured process and automatic history give them exactly that; a defensible account of what happened and how it was handled, generated as a byproduct of running the incident rather than as extra work afterward.

Customer Support

While the automated guardrails and slick features did a heavy lift to rebuild GRAIL's incident process, Danny cannot say enough good things about Rootly’s incredible support and engineering team. For him, they were the real secret weapon behind making this whole transition work–human to human. Danny specifically highlighted the impact of Rootly's all-star support crew—including Alex Conrad, Owen Sheppard, JP Cheung, Alexandra Chapin, Gidon Lapshun, and Eric Manning — who went above and beyond to ensure GRAIL's success. He mentioned having complete confidence that Rootly has GRAIL's back not only during the pre-sale phase, but at any given time. Anytime Danny runs into new technical requirements or unexpected challenges with incident management broadly, he knows this all-star cast is always right there and ready to jump in to help. Swapping out legacy infrastructure tools is usually a massive headache—especially in a highly regulated, "life-adjacent" cancer-detection environment where you can't afford to let things break.

Danny notes that it’s the kind of project that normally drags on for months and gets bogged down in endless internal debates. But Danny emphasizes that the Rootly team didn't just dump some documentation on him and wish him luck. They stepped in as true partners and advisors, helping him clear strict internal compliance reviews in just a few short weeks.

Danny was incredibly impressed by how responsive and hands-on they were, making what could have been a painful migration feel incredibly smooth. Having that level of responsive, engineering-led backup meant Danny and his team could confidently roll out the platform and start seeing a massive ROI on literally their very first incident. When your target is a 99.999% reliable infrastructure, you need a partner who values uptime just as much as you do—and Danny will be the first to tell you that Rootly’s support team totally crushed it.

Proof points (GRAIL lightning round, 1-5)

Time to first status update improved: 5
Retrospective (RCA) completion rate improved: 5
Stakeholder communications are more consistent: 5
Internal customers (the engineering team) are happier: 5
Engineering morale is in a better place: 5

Plenty of teams treat incident tooling as a pile of features. GRAIL's experience points at something simpler and more durable: the value is structure. A process that starts cleanly, assigns clear roles, forces the right information in early, and produces an auditable record on its own is what turns a chaotic scramble into a repeatable practice, and in a regulated, life-adjacent environment, repeatable is the whole game. Rootly gave GRAIL that structure, cut the manual effort around it by roughly eighty percent, and delivered on the very first incident. The automation is real, but as Danny says, it's the guard rails that matter.