Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

Back to Blog
Back to Blog

March 19, 2025

10 mins

The Unofficial SRE Track for KubeCon EU '25

KubeCon doesn’t have an SRE track but we’ve gone through the 300+ sessions that’ll take place in London so you don’t have to.

Jorge Lainfiesta
Written by
Jorge Lainfiesta
The Unofficial SRE Track for KubeCon EU '25The Unofficial SRE Track for KubeCon EU '25
Table of contents

KubeCon EU ‘25 is taking over London in two weeks, boasting 229 sessions distributed across 22 tracks and 89 CNCF project maintainer talks. With so many interesting sessions to attend, you need to plan your schedule. Going through so many entries can take time, but I’ve got you. I’ve picked the most interesting SRE-adjacent talks that may interest you.

For this edition, I’ve divided my picks into four categories:

  • Cutting-edge observability: The newest developments around observability, including using AI to find anomalies, a query language spec, and OTel recommendations for databases.
  • Building Reliable AI Systems: You probably have an AI feature deployed or in the roadmap at your company. The talks here feature how teams are establishing SLOs for AI systems and dig into observability for agents.
  • Case Studies of Reliability at Scale: One of the things I love about KubeCon is that you can see how the projects you use are being pushed to the absolute limit. In this case, there’s a keynote by eBay on how they developed dedicated algorithms to feed traces to LLMs and a talk on how Etsy operates one of the largest Prometheus instances in the industry.
  • Deep Dives: Beyond logs and traces, KubeCon can also show you new perspectives and all the complexity behind elements you take for granted. I’ve picked a fascinating deep dive into the Kubernetes API, a common source of headaches, and an overview of the world of OTel maintainers, who get headaches trying to coordinate competing vendors and users.

I’ll be at KubeCon London, so make sure to say hi! You can find me at the Rootly Booth (S780). Or, join one of the three Happy Hours we’re hosting:

  • Tuesday (Apr 1): KubeCon Kickoff, Happy Hour by Rootly, Cloudflare, Cloudbees, and Chronosphere.
  • Wednesday (Apr 2): KubeCon After Dark, Happy Hour by Rootly, Checkly, DX, and Coralogix.
  • Thursday (Apr 3): KubeCon Unwind, Happy Hour by Rootly, Spotify Backstage, Infisical, and Authzed.

Cutting-edge Observability

First Day Foresight: Anomaly Detection for Observability

Prashant Gupta and Kruthika Prasanna Simha (Apple) will explore how pre-trained, unsupervised ML models can be leveraged for real-time anomaly detection from the moment your system goes live for the first time.

Using cloud-native tools like Kubeflow for model fine-tuning, they’ll walk through practical techniques to monitor application health and detect anomalies without historical data.

When: Wednesday April 2, 2025 11:15 - 11:45 BST

Where: Level 1 | Hall Entrance N10 | Room E

Add Apple’s talk to your schedule

From the Observability TAG: Designing a Common Query Language for Observability Data

Querying observability data shouldn’t be a headache. Alolita Sharma (Apple), Pereira Braga (Google), and Chris Larsen (Netflix) will walk us through the Observability TAG’s work on unifying query languages across metrics, traces, profiles, events, and logs that can also support AI applications.

This talk will walk us through the Observability TAG QLS is semantic query language spec proposal. The speakers will dive into design principles, trade-offs, and challenges of balancing simplicity, expressiveness, and performance.

When: Thursday, April 3, 2025, 11:00 am - 11:30 am BST

Where: Level 3 | ICC Capital Suite 10-12

Add the Observability TAG talk to your schedule

Enhancing Database Observability with OpenTelemetry

Now that OpenTelemetry’s database semantic conventions are stabilized, Marylia Gutierrez (Grafana Labs) will walk us through how to instrument applications with OpenTelemetry SDKs to collect actionable telemetry data from databases.

Marylia will dive into the available SDK implementations across languages and databases, current gaps, and how you can contribute to missing instrumentation.

When: Wednesday, April 2, 2025, 2:30 pm - 3:00 pm BST

Where: Level 1 | Hall Entrance N10 | Room E

Add Grafana Labs talk to your schedule

Building Reliable AI Systems

Dashboards & Dragons: Crafting SLOs To Tame the AI Platform Chaos

Alexa Griffith and Ankita Chaudhari (Bloomberg) will share how they’ve navigated the chaos of multi-cluster AI platforms using SLIs, SLOs, and observability dashboards.

From defining meaningful reliability metrics to designing actionable SLO dashboards, they’ll break down best practices for maintaining platform resilience across cloud, on-prem, and hybrid environments. Expect real-world lessons, battle-tested strategies, and practical takeaways to help ensure your AI workloads run smoothly—even at scale.

When: Wednesday, April 2, 2025, 11:15 am - 11:45 am BST

Where: Level 1 | Hall Entrance S10 | Room B

Add Bloomberg’s talk to your Schedule

Deep Dive To AI Agent Observability

Guangya Liu (IBM) and Karthik Kalyanaraman (Langtrace AI) will explore how OpenTelemetry can be extended to monitor, trace, and analyze AI-driven architectures. From tracing inference workflows to correlating AI-specific data like model performance and decision latency, they’ll break down the complexities of observability in multi-agent systems.

Expect practical demonstrations, real-world use cases, and insights how OpenTelemetry provides transparency, reliability, and optimization for AI-driven architectures running on Kubernetes.

When: Wednesday, April 2, 2025, 3:15 pm - 3:45 pm BST

Where: Level 1 | Hall Entrance N10 | Room E

Add IBM’s talk to your schedule

How To Supercharge AI/ML Observability With OpenTelemetry and Fluent Bit

Keeping AI/ML models reliable in production is not easy. Celalettin Calis (Chronosphere) will dive into how OpenTelemetry and Fluent Bit can work together to create a powerful open-source observability stack tailored for AI/ML workloads.

You’ll learn how to log and debug models like GPT and BERT, track prompts and their results, and monitor agent performance in production.

When: Friday, April 4, 2025, 3:15 pm - 3:45 pm BST

Where: Level 0 | ICC Capital Hall | Room J

Add Chronosphere talk to your schedule

Case Studies: Reliability at Scale

Keynote: AI Enabled Observability ‘Explainers’ at eBay

Vijay Samuel (Principal MTS, Architect, eBay) will share how eBay’s Observability Platform team built AI-powered "Explainers" for telemetry signals. Instead of just dumping data into an LLM and hoping for the best, they combined engineered algorithms with AI to create more predictable and accurate insights.

Vijay will explore how eBay tackled trace interpretation, critical path detection, and dashboard explanations by strategically combining AI and observability.

When: Wednesday, April 2, 2025, 9:33 am - 9:48 am BST

Where: Level 0 | ICC Auditorium

Add eBay’s Keynote to your schedule

Pushing the Limits of Prometheus at Etsy

Chris Leavoy (Etsy) and Bryan Boreham (Grafana Labs) share their journey of scaling a single Prometheus instance to its absolute limits—running on a 128-core machine with 4TB of RAM and handling up to 500 million metrics.

This talk dives into the challenges of operating one of the industry’s largest Prometheus deployments, covering key lessons in performance tuning, diagnosing bottlenecks, and optimizing metrics volume for resilience.

When: Thursday, April 3, 2025, 11:45 am - 12:15 pm BST

Where: Level 1 | Hall Entrance N10 | Room G

Add Etsty’s talk to your schedule

Deep Dives

The Life (or Death) of a Kubernetes API Request, 2025 Edition

Abu Kashem (Red Hat) and Stefan Schimanski (Upbound) will take you on a deep dive into the lifecycle of a Kubernetes API request—from its arrival at the API server to its response back to the caller. Using clear diagrams instead of code, they’ll break down the request flow, Kubernetes architecture, and the observability signals (logs, audits, metrics, and errors) that help diagnose issues.

When: Wednesday, April 2, 2025, 3:15 pm - 3:45 pm BST

Where: Level 1 | Hall Entrance S10 | Room C

Add this Kubernetes API talk to your schedule

OTel Me How To Get My Open Source Community Taken Seriously: Lessons Learned as an OTel Maintainer

Building a successful open source project takes more than just great code—it takes a thriving community. Reese Lee (New Relic) and Adriana Villela (Dynatrace) will share insights from their work as OpenTelemetry (OTel) maintainers, covering what it takes to raise awareness, drive adoption, and connect contributors with end users.

When: Thursday, April 3, 2025, 4:45 pm - 5:15 pm BST

Where: Level 1 | Hall Entrance N10 | Room H

Add this OTel Community talk to your schedule

Rootly at KubeCon EU ‘25

Rootly will have a big presence at KubeCon London. Find us at our booth (S780). Once again, I’d love to catch up with you at one of our Happy Hours:

  • Tuesday (Apr 1): KubeCon Kickoff, Happy Hour by Rootly, Cloudflare, Cloudbees, and Chronosphere.
  • Wednesday (Apr 2): KubeCon After Dark, Happy Hour by Rootly, Checkly, DX, and Coralogix
  • Thursday (Apr 3): KubeCon Unwind, Happy Hour by Rootly, Spotify Backstage, Infisical, and Authzed
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Bood a demo
Bood a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Bood a demo
Bood a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Book a demo
Book a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Bood a demo
Bood a demo
Rootly_logo
Rootly_logo

AI-Powered On-Call and Incident Response

Get more features at half the cost of legacy tools.

Book a demo
Book a demo