Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

Back to Blog
Back to Blog

March 31, 2025

8 mins

A Guide to Evaluating AIOps and Agentic AI Tools

A practical framework for evaluating AI tools based on four core pillars: Accuracy, Transparency, Adaptability, and Agentic capabilities.

Dinesh Sukhija
Written by
Dinesh Sukhija
A Guide to Evaluating AIOps and Agentic AI ToolsA Guide to Evaluating AIOps and Agentic AI Tools
Table of contents

With so many shallow new AI announcements popping up ever week, my immediate reaction as an skeptical engineer is often an eye-roll.

As an SRE leader, I’ve witnessed firsthand how rushing version one of copilot tools led to frustration within engineering teams.

Early-stage AI copilots generated enthusiasm that soon plateaued, leading to eventual disengagement. There were a few reasons for that.

  • We didn’t know how to use these early copilots best
  • The capabilities of these AIs were limited or superficial
  • The generated code wasn’t of good enough quality

However, the most recent, V2 of these copilot tools have significantly improved, with many now addressing the early gaps and introducing capabilities that are operating beyond code assistance.

We are now seeing a wave of excitement around AIOps tools, tools that promise to operate as an SRE copilot or autopilot. How can you evaluate which ones can actually be useful to your team?

The Four Pillars of Agentic AI Evaluation

My previous experience of onboarding code copilot tools has required me to approach AIOps evaluation with a structured framework based on four critical pillars:

  1. Accuracy (Trust)
  2. Transparency (Reasoning)
  3. Adaptability
  4. True Agentic Promise

In this guide, I will outline my approach to evaluating AIOps and AgenticAI tools based on four critical pillars, what are some tactics to evaluate these and what to watch out for.

1. Accuracy: The Model Has to Work

Accuracy is a fundamental requirement. 49% of organizations cite false positives as a major challenge in AIOps adoption, contributing to alert fatigue among IT teams.

If an AIOps platform promises to enhance incident detection, automate responses, or provide root cause analysis, then it must demonstrate a high success rate in achieving those goals.

How to Evaluate Accuracy

  • Introduce chaos into a controlled environment. For example, create synthetic failures by stressing your database by dropping an index (in a non-production setting) to see if the agent can detect the root causes correctly.
  • Observe whether the tool accurately identifies the root cause and provides meaningful recommendations.
  • Look for false positives or irrelevant alerts that could erode confidence over time.

Typically you want your “acceptance rate” for AIOps (or Copliot) tools to be over 80% to begin with and continue to get better over a period of 6 months to be over 90%.

2. Transparency: Building Trust Through Explainability

A recent G2 report found that AIOps software adoption is at 56%, compared to an average of 70% for other IT infrastructure and management software, with transparency cited as a key concern.

For engineers to trust an AIOps system, they must have visibility into how the tool arrives at its conclusions. A tool that offers clear, verifiable reasoning will drive better adoption and confidence among SREs.

How to Evaluate Transparency

  • Check if the tool provides a breakdown of its reasoning process.
  • Does it show why a specific recommendation was made?
  • Does it highlight which logs, metrics, or code changes contributed to its decision?
  • Can engineers verify their findings independently?

AIOps tools that enable trust and transparency keeping a great developer experience in mind where engineers can validate the recommendations will gain acceptance more widely than the rest.

3. Adaptability: Ensuring AIOps Evolves with Your Needs

AIOps is a field that continues to evolve rapidly, with a few key players competing to define the next generation market. This presents an opportunity to select a vendor that not only meets your current needs but can adapt and evolve alongside your organization.

Choosing a vendor that treats you as a partner can maximize long-term value and adoption.

How to Evaluate Adaptability

  1. Assess the breadth of integrations the vendor supports. Can they quickly build custom integrations for your specific tools and data sources?
  2. Work with the vendor to identify rare but critical incidents. Are they willing to develop features that address these scenarios?
  3. Does the vendor allow custom workflows to match your operational processes, or are you forced to adjust to their constraints?
  4. Can your feedback shape their product development? Vendors open to co-development can help tailor the platform for better adoption and ROI.

By selecting a vendor that prioritizes adaptability, you ensure your AIOps solution remains aligned with your evolving infrastructure and operational needs.

4. True Agentic Promise: Adaptability and Learning

Agentic AI promises that the tool doesn’t just analyze past data, it learns and adapts to your specific environment. Unlike traditional AI models that provide static recommendations, agentic AI evolves, deepening its contextual understanding over time.

How to Assess Agentic Capabilities

  • Ask vendors if the tool learns from past incidents and incorporates those learnings into future predictions.
  • Evaluate its ability to understand depth (specific issues) and breadth (system-wide anomalies).
  • Assess its data privacy measures—how does it maintain security while learning from internal incidents?
  • Run a long-term evaluation (my recommendation is to ask for a 3-month trial or a POC phase rather than the typical 1 month) does it improve its effectiveness over time?

If a tool genuinely exhibits agentic behaviour, it should become more valuable as it continuously refines its understanding of your infrastructure.

Final Thoughts: What Are You Evaluating For?

The AIOps landscape is evolving rapidly, and evaluation criteria will continue to shift as capabilities advance. However, Accuracy, Transparency, Adaptability, and True Agentic Promise remain foundational pillars for assessing these tools today. While no framework is static, this rubric provides a good starting point for making informed decisions in an evolving space.

Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Bood a demo
Bood a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Bood a demo
Bood a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Book a demo
Book a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Bood a demo
Bood a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Book a demo
Book a demo