How we built an OSS LLM-powered Incident Diagram Generator
Discover IncidentDiagram, an open-source CLI tool that uses LLMs to turn incident retrospectives and codebases into easy-to-understand visual diagrams.
Incidents are resolved by highly technical folks who have a deep understanding of the systems that failed. But depending on their impact, incidents may need to be understood by many more people than those who helped resolve them, from stakeholders to engineers from different teams.
However, pointing them to a lengthy postmortem document that references a codebase they’re not familiar with makes incidents difficult to grasp. That means responders are often dragged into meetings or are repeatedly asked questions about what happened.
What if responders could reduce the knowledge gap about the incident by providing a visualization of how the system was impacted? Better yet, what if responders could get these diagrams generated by an AI?
That’s the vision we worked towards on this Rootly AI Labs project, IncidentDiagram. IncidentDiagram is an open-source tool that uses several LLMs to generate incident diagrams from your postmortem document and codebase.
What is IncidentDiagram?
IncidentDiagram is a command-line tool designed to parse your incident retrospectives and your codebase to produce a visual representation of the incident's key events in a diagram.
IncidentDiagram is written in Python and relies on a series of specialized LLMs from OpenAI, Anthropic, and Gemini, all orchestrated through smolagents, an LLM framework from Hugging Face.
How it works + demo
We put together a demo and a walk through of IncidentDiagram works on a short video:
In a nutshell, IncidentDiagram has three main stages and uses different agents at each:
Understand the impacted codebase. The tool fetches a GitHub repository and uses o3-mini to understand the file structure and the code within those files to generate a description of components and their relationships.
Understand postmortem/incident retrospective. IncidentDiagram goes through the post-mortem document and uses an LLM agent to create a list of components that were affected. Then it matches those components to the ones found in the codebase.
Create a diagram showing the components and their relationships. The tool uses the output from the previous steps to draw a Mermaid diagram, making sure it highlights the components the incident affected in a way that lets the reader get enough context about them.
Contribute and Collaborate
IncidentDiagram is an open-source prototype, and we welcome contributions. There are many ways forward for the project. For example, there are more sources of information that can be used to build the diagram, such as specific commits mentioned in the postmortem document.
Whether it's improving the parsing capabilities, enhancing diagram aesthetics, or integrating with other tools, your input is valuable. You can find the project on GitHub: Rootly-AI-Labs/IncidentDiagram.
Get the latest from Rootly
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
AI-Powered On-Call and Incident Response
Get more features at half the cost of legacy tools.