Building Systems For AI: Lessons From DevOps History

Table Of Contents

The Great DevOps Awakening (And Why We're Due For Another) The Three Ways (Now With More Dimensions) The Patterns Are Already Here Code Is Still Liability (It's Just Not Your Liability Anymore) The Phoenix Project Rises Again What Good Looks Like The Road Ahead

In 2008, Nuance hired me to join their Healthcare Speech Recognition team as a “Release Engineer.” DevOps wasn’t a thing yet — Patrick Debois and Andrew Shafer wouldn’t hold their first “DevOpsDays” until 2009.

But I was lucky that “Release Engineer” at Nuance meant “jack of all trades” who wrote Makefiles, bash scripts, Perl, and Java to build and release code to a fleet of hundreds of on-premise Linux machines.

I worked closely with our Ops team to make all of this happen. No, I wasn’t at the meetups in Belgium where DevOps was being born. But I was doing DevOps all the same, living the principles before they had a name.

Sometimes fortune smiles on us. I look back fondly on those early DevOps days because I crammed decades of learning into a few years. Every deployment failure at 2 a.m., every “works on my machine” mystery, every hand-crafted server configuration — they were all lessons in why systems thinking matters more than tools.

Now, watching teams adopt AI with the same chaotic enthusiasm we once had for cloud computing, I see history preparing to teach the same lessons — just with much bigger bills.

The Great DevOps Awakening (And Why We’re Due For Another)

Remember when “DevOps” was just the radical idea that maybe — just maybe — the people who write code should talk to the people who run it? Revolutionary stuff. It took us from the stone age of throwing code over the wall to the modern era of continuous delivery, infrastructure as code, and blameless postmortems.

But here’s what everyone forgets: DevOps wasn’t just about tools. It was about systems thinking. About feedback loops. About creating a culture where failure was a teacher, not a career-limiting event.

As I’ve been preaching in my talks:

We must deliberately extend the same principles to govern AI usage, infrastructure, and costs.

This isn’t optional. It’s survival.

The Three Ways (Now With More Dimensions)

Gene Kim introduced the Three Ways of DevOps in his seminal book, “The Phoenix Project”:

The First Way: systems thinking

In 2017, when I joined CloudZero as a founding engineer, one of my first principles was “Understand the Domain.” You can’t optimize what you don’t understand. You can’t govern what you can’t see.

With AI, systems thinking becomes multidimensional:

Traditional DevOps: Follow the code from commit to production.
AI-Augmented DevOps: Follow the code from prompt to production, through model selection, token consumption, and hallucination detection.

The problem isn’t just technical — it’s about systems thinking. AI agents struggle to determine when to use tools versus when to rely on their internal knowledge, especially when working with multiple tools in complex environments.

This complexity is expensive: reasoning through problems requires multiple calls to the underlying LLM, which quickly adds up in terms of dollars spent. Teams are discovering what happens when AI lacks proper constraints: instead of a stack overflow, you get a credit card overflow.

The Second Way: amplifying feedback loops

Here’s the thing: AI is like a super intelligent Golden Retriever. Context is everything, but its memory and focus is… SQUIRREL! Without structured feedback loops, you’re trying to maximize productivity while minimizing hallucinations with a brilliant assistant that has the attention span of a caffeinated pre-teen playing Roblox while watching YouTube.

At CloudZero, we’ve developed a structured development workflow that creates feedback loops at every stage:

Requirements → Tasks → Implementation

Instead of throwing a vague prompt at AI and hoping for the best, we break it down:

Human prompts with a one-sentence requirement
AI asks clarifying questions (feedback loop #1)
Human answers, creating shared context
AI generates a Product Requirements Document
Human reviews, amends, approves (feedback loop #2)

This pattern repeats through task generation and implementation. Each stage has a human checkpoint — not because we don’t trust AI, but because we’ve learned that AI without feedback is like a Ferrari without brakes. Sure, it goes fast. But eventually, you’re going to hit something expensive.

The magic isn’t in the structure itself. It’s in what the structure prevents:

No more wandering AI that starts building a login system and ends up implementing a blockchain
No more discovering after deployment that your AI interpreted “user-friendly” as “requires a PhD”
No more code that works perfectly but solves the wrong problem

The teams succeeding with AI aren’t trying to eliminate the SQUIRREL moments. They’re building systems that catch them before they matter. Because in the end, a feedback loop is just a leash for your very smart, very enthusiastic, very expensive Golden Retriever.

The Third Way: culture of experimentation

As Adam, Jerod, and guest Abi Noda discussed on recent episode of The Changelog: Team Safety and Happiness is still one of the best predictors of Productivity.

But “move fast and break things” hits different when “things” includes your AWS budget. But we still need experimentation — just with guardrails that would make a bowling alley jealous.

The teams succeeding with AI aren’t the ones who locked it down completely. They’re the ones who created safe spaces for experimentation. Think of it as a sandbox, but instead of sand, it’s filled with Monopoly money for API credits.

The Patterns Are Already Here

During my time at Nuance/Microsoft, I lived through the microservices revolution. The patterns we learned then apply directly to AI:

Service governance then: Service discovery, API versioning, contract testing
AI governance now: Model versioning, prompt engineering standards, output validation
Cost management then: Reserved instances, spot pricing, right-sizing
Cost management now: Token optimization, model selection, batch processing
Observability then: Distributed tracing, log aggregation, metrics dashboards
Observability now: Prompt tracking, token usage analytics, hallucination rates

It’s the same game with new players. And just like microservices, the teams that win are the ones who build the boring stuff — governance, observability, cost controls — before they need it. Let me repeat that. Yes — build the boring stuff before you need it.

Code Is Still Liability (It’s Just Not Your Liability Anymore)

One of my core principles has always been “Code is Liability.” Every line you write is something you have to maintain, debug, and eventually replace. AI doesn’t change this — it just shifts the liability around like a shell game.

When AI generates code, you’re trading direct liability (I wrote this, I own it) for indirect liability (AI wrote this, but I deployed it, so I still own it). It’s like hiring a contractor who works at superhuman speed but occasionally builds doors that open into walls.

The systems we need aren’t just about managing AI — they’re about managing the intersection of human judgment and AI capability. Because at the end of the day, when production goes down at 3 a.m., the AI isn’t getting paged. You are.

The Phoenix Project Rises Again

Every few years, our industry rediscovers that the technical problems were never the hard part. The hard part is the human systems around the technology.

When “The Phoenix Project” introduced DevOps to the masses, it wasn’t really about deployment pipelines or configuration management. It was about breaking down silos, creating shared ownership, and building systems that could evolve.

AI is forcing us to evolve again. But evolution doesn’t mean throwing away everything we’ve learned. It means adapting proven patterns to new challenges:

Build Once, Deploy Many becomes Train Once, Inference Everywhere
Infrastructure as Code becomes AI Behavior as Code
Shift Left on Security becomes Shift Left on AI Governance

What Good Looks Like

I know we’re building the right systems when:

A junior developer can use AI without accidentally burning a month’s budget
Our AI agents have circuit breakers that prevent infinite loops of expensive API calls
We can trace every piece of generated code back to its prompt and model version
The phrase “AI did something weird” triggers a runbook, not a panic attack
Our FinOps dashboard shows AI costs per feature, not just per invoice

This isn’t some far-off utopia. Teams are starting to build these systems today. They’re just doing it quietly, methodically, and with a healthy respect for Murphy’s Law.

The Road Ahead

As I write from my home in Massachusetts, I’m reminded of the stone walls that crisscross New England. They were built slowly, deliberately, one rock at a time. They’ve lasted centuries because the builders understood that good systems take time.

The choice is ours. But remember: AI won’t wait for us to figure it out. It’s already generating code, making decisions, and running up bills. The only question is whether we’ll build systems to govern it or let it govern us.

Author: Adam Tankanow

A founding engineer at CloudZero, Adam builds from first principles after years at Bose, Dolby, Nuance, and MassMutual. He has degrees from Tulane and Miami tucked away like old maps. His thoughts follow hard-won truths: memory fails but writing endures, old wisdom outlasts cleverness, understanding precedes implementation. When tangled, he walks among trees, knowing himself as learner and cultivator, connector and arranger – roles that chose him as he chose them.

The Cloud Cost Playbook

The step-by-step guide to cost maturity

Any Cost Source, All In One View

The Cloud Cost Playbook

Building Systems For AI: Lessons On Governance From DevOps History

The Great DevOps Awakening (And Why We’re Due For Another)

The Three Ways (Now With More Dimensions)

The First Way: systems thinking

The Second Way: amplifying feedback loops

The Third Way: culture of experimentation

The Patterns Are Already Here

Code Is Still Liability (It’s Just Not Your Liability Anymore)

The Phoenix Project Rises Again

What Good Looks Like

The Road Ahead

The Cloud Cost Playbook

Any Cost Source, All In One View

The Cloud Cost Playbook

Building Systems For AI: Lessons On Governance From DevOps History

The Great DevOps Awakening (And Why We’re Due For Another)

The Three Ways (Now With More Dimensions)

The First Way: systems thinking

The Second Way: amplifying feedback loops

The Third Way: culture of experimentation

The Patterns Are Already Here

Code Is Still Liability (It’s Just Not Your Liability Anymore)

The Phoenix Project Rises Again

What Good Looks Like

The Road Ahead

The Cloud Cost Playbook

Suggested Articles