Jorge Lainfiesta

The Unofficial KubeCon EU '26 SRE Track

6 talks to add to your schedule

Jorge Lainfiesta

February 25, 2026

6 mins

Alerting as Code: How Mistral AI Uses Terraform as the Source of Truth

A Terraform-first model for deterministic alerting in AI systems

Jorge Lainfiesta

January 29, 2026

7 mins

When Nothing Changes and Everything Breaks: Why Machine Learning Fails Differently

Why 50% of companies don't monitor ML and how it’s reshaping our understanding of reliability.

Jorge Lainfiesta

October 30, 2025

6 mins

The Art of Incident Management, Part I

“Art, in itself, is an attempt to bring order out of chaos.” - Stephen Sondheim

Jorge Lainfiesta

September 9, 2025

4 mins

Taming the Angry Intern: How AI is Reshaping Platform Engineering

Turning AI into a predictable, policy‑driven part of your platform engineering toolkit

Jorge Lainfiesta

August 4, 2025

4 mins

Best Site Reliability Engineering Tools DevOps Teams Swear By

From monitoring dashboards to automation workflows, discover the SRE tools DevOps teams rely on to keep systems reliable in 2025.

Jorge Lainfiesta

August 1, 2025

6 mins

The Art of Not Getting Woken Up for Nothing

Strategies from SRE leaders fighting noisy alerts in complex system.

Jorge Lainfiesta

July 22, 2025

10 mins

Owning Reliability at Scale: Inside the Hybrid Incident Models

How should you structure your incident response team? From severity-based escalation to role-driven orchestration, hybrid models are helping teams scale reliability and balance resources.

Jorge Lainfiesta

July 10, 2025

11 mins

10 Best Incident Management Software in 2025 (Ranked by Performance)

Discover the 10 best incident management software tools of 2025 to reduce downtime, improve coordination, and speed up response efforts for your team.

Jorge Lainfiesta

June 6, 2025

8 mins

Incident Management vs. Problem Management: Key Differences and When to Use Both

Incident management restores service fast. Problem management finds the root cause. Master both approaches to build resilient IT operations.

Jorge Lainfiesta

June 5, 2025

6 mins

SLA vs KPI: Understanding the Key Differences and How to Effectively Use Both

What’s the difference between an SLA and a KPI? SLAs define service expectations, while KPIs measure performance. Learn how they relate and when to use each.

Jorge Lainfiesta

June 1, 2025

8 mins

Top Opsgenie Alternatives in 2025: Opsgenie Is Shutting Down — Don’t Just Replace, Upgrade

Opsgenie is shutting down. Don't settle for a downgrade to JSM. Explore the best Opsgenie alternatives for 2025 and find a true upgrade with modern AI and automation.

Jorge Lainfiesta

April 1, 2025

6 mins