Table Of Contents
Question 1: What Services Are Affected? Question 2: What’s Still Up And Running? Question 3: What’s Burning Cash? Outage Anxiety? Time For A Chill Pill

On Oct. 20, the Internet woke up and seemingly chose violence. For more than 12 hours, Amazon Web Services (AWS) went down. From banking platforms to hospital communications to mobile ordering apps, digital services came to a screeching halt.

The cause? Two programs are trying to write a DNS entry simultaneously, failing, and leaving the entry blank. Thus began the incredibly costly failure cascade. According to Mehdi Daoudi, CEO of Catchpoint, the outage could “easily reach into the hundreds of billions”.

And if that wasn’t bad enough, cloud catastrophe was back for a second round on Oct. 29 as Azure took a hit. This time, it was a patch gone awry in Azure Front Door, Microsoft’s global content and application delivery network.

These incidents are a timely (and terrifying) reminder that while cloud services are essential for operations, they’re ultimately black boxes: Customers see what goes in and know what comes out, but are in the dark about what’s going on behind the curtain.

When outages hit, $%^# hits the fan. Suddenly, teams need to know exactly what’s going on. What’s up, what’s down, and what they can do to triage the damage.  

What’s more, executives, stakeholders, and finance teams come looking for answers — and IT is responsible for getting them. 

Not sure if you’re ready for the next cloud crash? Take our outage anxiety test: Three questions, 10 minutes. Let’s go.

  1. Which services are affected?
  2. What’s still up and running?
  3. What’s burning cash?

So, how did it go? For most teams, the first two questions aren’t so bad, especially if they’re using cloud observability tools and have solid backups in place. 

The third one gets tricky because offline services cause a knock-on effect for those still up and running. When CEOs and CFOs come asking, however, IT teams need answers.

Let’s take a closer look.

Question 1: What Services Are Affected?

This is the (relatively) easy one. Users complain, IT teams check, and the list basically writes itself.

Common applications and services that go offline when clouds take an unexpected break include:

Websites

This could include some or all of your webpages, depending on how they’re stored, and if you run backups.

E-commerce platforms

Most e-commerce platforms depend on the cloud — if they fail, you’re not making any sales.

Social media platforms

Facebook, Snapchat, and Instagram all rely on cloud services. For example, Snapchat uses both Google Cloud and AWS. Facebook relies on Azure, and Instagram leverages AWS for storage. 

Cloud-based development environments

Examples include Amazon CodeCatalyst, Azure Pipelines, or Google Cloud Workstations. 

Mobile applications

When AWS went down, customers couldn’t order Starbucks on their mobile apps. Given that 64% of users now prefer mobile apps to business websites, this is no small problem.

Streaming services

As noted by TechRadar, streaming services like Netflix and Spotify were directly impacted by the AWS failure for just 70 minutes. In that time, estimated Netflix losses were $4.5 million, while Spotify may have lost $2 million. 

(Fun fact: Did you know that shutting down the internet for just one hour in the United States alone can lead to nearly half a billion dollars in financial impact? Now you know.)

The Cloud Cost Playbook

Question 2: What’s Still Up And Running?

Some apps and services will weather the storm. These include third-party applications that leverage more than one cloud, and in-house tools that failover to local or connected backups. 

While backups offer a way to ensure continuity of service, they also come with a cost: Extra infrastructure you’re spinning up to compensate.

Consider a customer-facing mobile app. Given the need for rapid resource and bandwidth scaling, public clouds make sense as mobile software hosts. If your cloud provider goes dark, however, it’s worth having a failover plan in place.

The caveat? Replicating mobile environments and providing seamless service to users is resource- and cash-intensive. The longer outages last, the more you’re spending on backups — while still on the hook for your monthly cloud spend.

Question 3: What’s Burning Cash?

When finance teams come knocking, they want the big picture. What’s down (and costing you money), what’s still up (and costing you money), and what else is burning cash?

For many IT teams, this feels like a trick question — how are they supposed to know where every dollar is going and how they’re being spent, especially if they’ve successfully petitioned the C-suite to invest in multiple cloud services?

In practice, three components play a role in cash calculations.

1. System architecture

Architecture includes both systems themselves and how these systems are interconnected and interdependent. For example, in-house apps often leverage data stored in the cloud since storage is more cost-effective. 

Edge devices connected to IoT sensors, meanwhile, may relay data directly to solutions such as ERP or CRM, but comprehensive analysis may be offloaded to cloud analytics platforms.

Companies may also rely on containerized applications that are environment-agnostic. In this case, location doesn’t matter — these apps can operate anywhere. Not all locations, however, come with the same compute costs. 

2. Service-level agreements (SLAs)

Teams also need access to SLA data for cloud providers, SaaS, PaaS, and any other as-a-service solutions. While most SLAs offer some provisions for prorated service costs in the case of an outage, these terms vary significantly by provider and by the type of service you purchase. 

For businesses with multiple clouds and hundreds (or thousands) of as-a-service tools, these SLAs can represent a significant cost sink, especially when clouds are out of commission.

3. Per-source costs

Every application and service that runs on outside resources comes with a cost. From the general costs of storage and compute across Azure, AWS, and GCP, to more focused spend on solutions such as Snowflake, Kubernetes, or Anthropic, every source has a unique cost profile that changes based on current conditions.

Outage Anxiety? Time For A Chill Pill

Cloud outages are stressful. Sudden service disruptions impact business performance, customer perception, and wreak havoc with IT environments.

With IT operations now part of business strategies, however, finance teams and C-suites want to know exactly how much a cloud collapse could cost, and how much it takes to keep things up and running.

This starts with a complete analysis of what you have, what’s affected, and what’s still running. It’s bolstered by data architecture and agreements, but often goes off track when stakeholders come looking for precise pricing data.

CloudZero is your cloud cost chill pill. Allocate all your spend — no tagging required — automatically detect cost spikes, and measure unit costs to get a complete picture of where you spend, and how much you’re spending.

Reduce your cloud cost anxiety with a free assessment from CloudZero. Start here.

The Cloud Cost Playbook

The step-by-step guide to cost maturity

The Cloud Cost Playbook cover