August 4, 2025

4 mins

Taming the Angry Intern: How AI is Reshaping Platform Engineering

Turning AI into a predictable, policy‑driven part of your platform engineering toolkit

Written by

Jorge Lainfiesta

Taming the Angry Intern: How AI is Reshaping Platform Engineering

AI in platform engineering? Think of it like hiring the most ambitious, slightly chaotic intern you’ve ever met.

As Aaron Erickson from NVIDIA put it in a recent panel during PlatformCon NYC:

“I always imagine AI stands for angry intern. You’re putting an angry intern in a Docker container and giving it the ability to write its own code, and then you wonder why things could go wrong.”

This intern works fast, has endless energy, and sometimes brilliant ideas. But you don’t hand them root access on day one. Give them scoped credentials. Spell out what they’re allowed to touch. Keep an eye on them until they’ve earned your trust.

Aaron warns that it’s not just engineers joining the team now. AI opens the door for business vibe coders: people in marketing, ops, or finance who, with a few prompts, can now build and ship software. The angry intern is about to have a whole posse.

The question then becomes: how do you keep your angry interns safe? Platform engineers might hold the key to integrating AI workflows at scale.

PlatformCon panel about AI and platform engineering

AI Doesn’t Fix Bad Plumbing

AI won’t rescue a platform with shaky foundations. If your CI/CD is flaky, deployments are unpredictable, and orchestration is duct‑taped together, AI will just make the chaos faster and harder to untangle.

Rickey Zachary, Global Lead of Platform Engineering at Thoughtworks, puts it bluntly:

“Why talk about agentic models when I can’t do CICD correctly? Why talk about agentic orchestration when I can’t just orchestrate a delivery or deployment into production?”

It’s a trap many teams fall into: racing ahead with AI pilots while their core developer experience is brittle. The result? You’ve automated bad patterns at scale. AI thrives when it has a healthy platform to work with, not one where it’s trying to duct‑tape over recurring failures.

A 10x (or 100x) Ops Boost

The 10x Engineer concept fell out of grace a few years ago. But now, AI may bring back the concept of an engineer with superpowers. Sylvain Kalache from Rootly sees AI as pure augmentation:

“I believe this type of tool can lead us to become a 10x or a hundred x engineer, by automating and helping you do your job better.”

One of the most immediate wins: incident triage. AI can chew through a mountain of alerts, identify which ones matter, and summarize them in plain English. No more scrolling through endless Slack alerts to figure out what’s important.

Then there’s root cause analysis. Instead of hopping between Grafana, Kibana, logging pipelines, recent merges, and your incident tracker, an AI SRE can piece together the data and surface a probable root cause in minutes.

“An investigation that may take 10 to 20 minutes… an agent system can do in a matter of two minutes.”

But the real magic happens when you move from reactive to proactive. AI can simulate chaos scenarios to stress‑test your systems, predict likely failures by spotting unusual patterns, and help you identify dangerous states before things break.

For example, Aaron’s team at NVIDIA runs time‑series transformer models across their massive GPU fleets to predict emerging issues before they become incidents. Think of it as AI’s early‑warning radar for platform health. If the angry intern can stop you getting paged at 3 a.m., that’s a keeper.

Guardrails Are Non‑Negotiable

Speed without control is just a faster way to break things. Vishakha Sadhwani from Google says AI agents should be treated exactly like any other core system component:

“They have to go through the same level of scrutiny… scanned, vetted, follow the ecosystem rules of the system.”

At Google, that means scanning and validating models before they even get a chance to run in production, something they call Model Armor. The goal: block risky behavior at the source, not after it’s already shipped.

Aaron takes it further: ground AI in deterministic systems of record, your single sources of truth. Without them, AI is just guessing with a nice UI. He reminds us that even top‑tier LLMs can get basic arithmetic wrong unless you train them to know when to use a calculator.

And when you do give AI access, scope it tightly. Imagine you really were onboarding a junior hire, you wouldn’t give them production database credentials and wish them luck.

The New Developers in the Room

AI changes who can build software. “Developer” used to mean someone who lived in an IDE or a terminal. Now it can be anyone who can write a coherent prompt.

“Your definition and your user base of an internal developer platform just became many, many more people.”

Aaron calls them “business vibe coders.” Rickey puts the risk plainly: what happens when a marketing intern can deploy a new app to production on day one?

Sylvain’s advice: don’t block them, channel them. Build golden‑path workflows that make it impossible to deploy something unsafe. If an intern asks for “a web app for this campaign,” the platform should turn that into a well‑tested, production‑ready template automatically.

This isn’t just about safety, it’s about speed. The more you can safely hand over to less‑technical teammates, the faster your whole organization can move without tripping over itself.

Platforms for AI, AI for Platforms

Vishakha sees it as a two‑way street:

“There is AI for platform and then there’s platform for AI. Both have to go together.”

On one side, AI makes platforms smarter: helping with provisioning, anomaly detection, incident response, and deploy orchestration. On the other, platforms need to evolve to run AI workloads reliably and securely.

Aaron thinks the building blocks themselves will change. Today, most AI workflows deal in big Duplo‑block‑style components. Soon, we’ll see more Lego Technic‑level services: smaller, more precise, more powerful. That means more flexibility, and a lot more ways to break things.

Context Means Ground Truth + Rules of the Road

In AI land, “context” isn’t a buzzword. It’s the instructions and domain knowledge you give your angry intern so it doesn’t make clever but wrong decisions.

Sylvain’s example: the Model Context Protocol (MCP), which lets developers interact with multiple systems from inside their IDE. No constant context‑switching to web dashboards, just chat with your AI in‑place, with the right access and data.

Aaron’s cautionary tale: someone asked their diagnostic agent, “Where are the zombie nodes?” The AI answered confidently: “Nodes with no network.” Logical. Also completely wrong. In NVIDIA’s world, “zombie node” has a much more specific meaning, one the AI didn’t know because no one had taught it.

That’s the heart of context engineering:

Define your terms (“zombie node” means exactly this).
Tell AI where to get ground truth (observability stack, config management, ticket system).
Make it use trusted tools for critical steps (calculators, policy checkers, security scanners).

How to Keep the Intern in Line

The angry intern is here to stay. Managed well, it’s the teammate who catches problems before you do, takes boring work off your plate, and helps your platform run like it’s been given superpowers.

The rules are simple:

Nail the fundamentals before you add AI.
Augment your team, don’t replace it.
Use guardrails and deterministic systems as your base.
Adapt for a bigger, less‑technical developer population.
Get serious about context.

Get those right, and your angry intern stops being a liability and starts being the best hire you’ve ever made.

The Hidden Costs of Immature Incident Management

The start of a journey towards a mature SRE practice.

Chris Inch

December 3, 2025

5 mins

Gemini 3 beaks OpenAI’s long-standing lead in SRE tasks.

A shift just happened in SRE AI performance. Gemini 3 Pro didn’t just edge out OpenAI’s models, it beat them across every SRE task we threw at it. The landscape is changing faster than anyone expected.