When cloud costs spike, compute is often the culprit. Using Azure Spot Instances could cut your compute costs by up to 90%. But Spot VMs come with trade-offs, including unpredictable evictions and capacity constraints. And that makes them tricky to use without the right strategy and visibility.
In this guide, we will share how to make them work for you. We’ll cover how Azure spot pricing actually works, the best use cases (and when to avoid them), and how to turn Spot VM savings into long-term gains.
Stick around, and we’ll also show you how innovative teams at companies like Moody’s, New Relic, and Skyscanner view, understand, and optimize their spot VMs without sacrificing reliability.
What Are Azure Spot Instances?
Spot isn’t a special type of virtual machine. It’s one of several pricing options you apply to standard Azure VMs. The other two main ones are:
- Pay As You Go or On-Demand VMs, which offer full availability and flexibility at standard rates, and
- Azure Reserved Instances (Azure RIs) and Azure Savings Plans require you to commit to a one- or three-year contract in exchange for predictable savings that are ideal for stable, long-term workloads.
See also: Azure Savings Plans 101: Tips, Tactics, And Best Practices To Apply Now
Microsoft Azure continuously monitors unused compute capacity across its global network of data centers.

Image: Azure Availability Zones
When spare capacity is available, the cloud provider offers it at a discounted price through Spot VMs.
These instances are identical in performance to regular VMs. They use the same underlying infrastructure and come in many of the same sizes and types as standard Azure virtual machines.
However, Spot pricing offers the biggest discounts, but without any SLA or availability guarantees.
So, Azure Spot Instances refer to a pricing option that lets you run standard Azure Virtual Machines (VMs) using unused Azure compute capacity — at up to 90% less than the standard (pay-as-you-go) rate.
Yet, there are gotchas and gems you need to know here.
How Azure Spot Pricing Works
Pricing for Azure Spot instances is dynamic. The price you pay can change based on supply and demand across Azure’s global infrastructure. When demand is low, prices drop. When demand rises, prices increase, and so does the risk of eviction.
Here’s a breakdown of how that really works so you can take full advantage of Spot VMs on Azure:
1. You can set a max price (or let Azure decide)
When deploying a Spot VM, you have two pricing options:
- Default price: Azure charges you the current market rate, which fluctuates automatically based on available capacity.
- Max price: You set the maximum you’re willing to pay per hour through a bidding process. If the market price exceeds your max, Azure will stop or deallocate your VM.
Setting a max price gives you control, but keep in mind that even if the current rate is low, you might get outbid when capacity tightens.
2. Evictions can happen at any time
Azure can evict your Spot VM at any point if:
- Azure no longer has spare capacity in the region or zone you’re using, or
- The current market price exceeds your max price (if you’ve set one)
You’ll typically get a 30-second eviction notice before Azure reclaims your Spot VM. Depending on your configuration, the instance will either be deallocated (stopped and preserved) or deleted entirely.
If you’re familiar with AWS Spot Instances, the concept is similar, though AWS provides a longer 120-second notice.
However, you can choose between two eviction policies:
- Deallocate: The VM is stopped, and you can restart it later if capacity becomes available.
- Delete: The VM and its resources are removed completely.
Both approaches require you to design your workloads to tolerate interruptions.
3. Availability and pricing vary by region and VM size
Spot capacity isn’t consistent across the board. Some Azure regions, availability zones, and VM sizes offer more spot capacity than others. It pays to experiment with and monitor where your workloads can run most cost-effectively.
Pros And Cons Of Azure Spot Instances (And What It Means For You)
From Azure’s perspective, it’s a smart way to monetize spare capacity that would otherwise sit idle.
For cost-conscious engineering teams, it’s an opportunity to run dev/test environments, batch jobs, container workloads, and AI/ML training jobs at a fraction of the normal cost.
Let’s say you’re running a machine learning training job using a standard D4s_v3 VM. On pay-as-you-go, it might cost $0.40/hour. As a Spot VM, the price could drop to $0.08/hour or lower. That’s 80% off.
But, if Azure suddenly needs that capacity back, your job will be interrupted unless you’ve built in failover, checkpointing, or some other mitigation. You only get 30 seconds to act each time, after all.
Spot VMs are standard Azure VMs running on the same infrastructure. You’re not sacrificing CPU, memory, or networking performance. You’re just paying less for temporary capacity.
Jobs that can be resumed from checkpoints or tolerate interruptions, such as CI/CD pipelines, containerized microservices, or dev/test environments, are perfect candidates for Spot.
Better yet, you can deploy Spot VMs at scale using Azure Virtual Machine Scale Sets (VMSS), Kubernetes node pools in AKS, or automation tools like Terraform and Bicep. This lets you scale up compute power when it’s cheap, and fall back gracefully when capacity dries up.
So, what are some ideal use cases for Azure Spot instances (and when should you avoid them)?
When To Use Azure Spot VMs (Without Regret)
Spot is a strong fit when speed is optional, but savings are critical. The best scenarios often share a few traits: stateless, interruptible, and distributed. Consider these:
- CI/CD pipelines: Run builds, tests, and deployment jobs that you can retry or resume without disrupting production. If a Spot VM gets evicted mid-build, you can just retry on another node.
- Containerized applications: In orchestrated environments like Azure Kubernetes Service (AKS), you can mix Spot and on-demand nodes in the same cluster. This gives you elasticity on the cheap, especially for horizontal autoscaling.
- Batch processing and rendering jobs: Video rendering, data transformation, ETL pipelines, and other parallel jobs can be distributed across Spot VMs with minimal impact from evictions.
- Machine Learning (ML) model training: Training jobs, especially with checkpointing or GPU workloads, can run on Spot VMs to accelerate experimentation and reduce your cost per model.
- Dev/test environments: Let your devs test new features or environments without the overhead of always-on infrastructure. Spot is perfect for disposable resources.
There are also times when the risk outweighs the reward.
When To Avoid Spot VMs
Spot instances are not ideal for anything stateful, sensitive, or latency-dependent, including:
- Production databases or stateful apps: These require high availability and data consistency, neither of which Spot VMs guarantee. Eviction could cause serious data loss or downtime.
- Customer-facing APIs and services: Anything with tight SLAs or real-time performance requirements is better off on Reserved or on-demand instances. Spot capacity can vanish mid-request.
- Monolithic apps without recovery logic: If your application can’t recover gracefully from a termination signal, don’t run it on Spot. The 30-day notice is no time to “save state” once eviction begins.
In short, the right workloads for Spot are the ones you’d happily let fail, and then rerun without consequence.
By now, you’ve picked up several actionable best practices for running Azure Spot VMs. You’ve seen the trade-offs and the opportunities. Up next, as promised, we bring it all together.
How To Use Azure Spot Instances Right (And Save Without Losing Track Of Costs)
This is where your Azure Spot instance strategy meets real-world execution, with the tools and visibility to make it work for your team, workloads, and bottom line. Here’s how high-performing teams get Spot right from day one.
1. Use VM Scale Sets to manage Spot capacity like the pros do
Azure Virtual Machine Scale Sets (VMSS) gives you control over how Spot VMs are deployed, evicted, and replaced. Here’s how:
- Mix Spot and pay-as-you-go VMs in the same scale set (using priority allocation). Let Spot handle the bulk of compute needs while standard instances maintain critical baseline performance.
- Configure scale sets to automatically redeploy evicted Spot VMs when capacity becomes available again.
- Set fallback policies. Use the “Flexible Orchestration Mode” to fall back to on-demand VMs when Spot capacity dries up. You’ll avoid full-blown outages this way.
Overall, VMSS gives you an autoscaling mechanism that’s both cost-effective and engineered for graceful degradation.
2. Run cost-tolerant workloads in mixed node pools on AKS
For containerized workloads, the AKS service supports dedicated Spot node pools. So, use taints and tolerations to:
- Assign non-critical jobs, like test environments, background jobs, or non-SLA workloads, to Spot nodes.
- Keep mission-critical services on standard VMs with full availability guarantees.
- Gracefully handle evictions by leveraging pod disruption budgets and autoscaling.
You can use Azure’s Cluster Autoscaler with eviction-aware policies to automatically reschedule pods onto available capacity.
3. Set smart max prices, and don’t “set it and forget it”
Set your maximum price per hour too high, and you risk overpaying. If you set it too low, your VMs may never start. Here’s how to strike the right balance:
- Start with Azure’s default market price to gather baseline data.
- Monitor how pricing behaves over time in your chosen regions and VM sizes.
- Then adjust the max prices based on workload importance and urgency.
For finance, you’ll also want to pair spot pricing trends with unit cost metrics. This will help you calculate when Spot is saving money, and when it’s not worth the risk. Yet, not all cloud cost management platforms offer proper unit cost metrics.
Check out our guide to Cloud Unit Economics here for immediately actionable tips, including what to track and which platform does it best for your specific goals.
4. Design for eviction (because it will happen)
Evictions aren’t bugs. They are built into the Spot model, and inevitable. So, design your workloads accordingly:
- Checkpoint long-running jobs so they can resume after an eviction (great for ML model training, batch ETL, or rendering).
- Use durable queues, such as Azure Service Bus or SQS equivalents, to requeue failed jobs.
- Separate stateful from stateless workloads. Keep state in resilient storage (like Azure Files or Blob Storage), not on the VM.
Your engineers can codify these patterns into Helm charts, Terraform modules, or Bicep templates to make Spot a default, not an exception.
5. Track trends across regions and sizes
Smart teams treat Spot availability as a data problem. And you can, too:
- Use Microsoft’s guidance or the Azure CLI to query spot capacity by region and size.
- Test different VM SKUs for the same job and benchmark success rates.
- Distribute workloads across multiple availability zones or VM families to improve resiliency.
One more thing. Running jobs in lower-demand zones can reduce both price and eviction risk (two birds with one Spot!).
6. Monitor spot usage and efficiency continuously
Cost savings are only real if they last. To help you ensure this, track metrics like:
- Eviction frequency per workload
- Number of job retries or reschedules
- Actual vs. projected savings over time
- Spot vs on-demand fallback rates
This enables your team to calculate effective savings and understand trade-offs in real-world scenarios.
Overall, when you implement them with engineering foresight and financial accountability, Azure Spot Instances become a high-leverage tool for saving money and scaling intelligently.
Your Next Move: From Spot Usage To Real Business Impact — Connect The Dots With CloudZero
Azure Spot Instances can deliver huge savings. But without visibility, they can also introduce risk. Your engineers may lose track of where Spot is (or isn’t) being used. Finance might struggle to quantify savings or tie compute costs back to the business.
CloudZero turns your Spot usage into real-time, business-aligned intelligence.
With CloudZero, you can:
- See exactly which teams are using Spot Instances, how efficiently, and for which services or environments. No manual tagging required.
- Understand how Spot VMs support your cost-to-serve, gross margin, and R&D goals — without hiring a dedicated FinOps analyst.
- Get proactive alerts when Spot usage drops, on-demand fallback kicks in, or retried jobs spike due to evictions.
And yes, you’ll catch the moment someone ships a new pipeline without Spot enabled well ahead, not weeks later when the bloated invoice shows up.
Don’t just take our word for it. Leading teams at Moody’s, Expedia, and Skyscanner rely on CloudZero to track Spot savings, benchmark usage across engineering teams, and measure real ROI.
You can, too. Risk-free. to get started. We just helped Upstart cut over $20 million from their AWS bill.
FAQs About Azure Spot Instances
Are Azure Spot Instances always cheaper than on-demand VMs?
Spot Instances are typically up to 90% cheaper than pay-as-you-go pricing. However, the exact savings fluctuate based on regional demand and available capacity.
What happens when an Azure Spot VM is evicted?
Your Spot VM is evicted with a 30-second warning when Azure needs the capacity back. You can choose whether the VM is deallocated (stopped but preserved) or deleted entirely.
Can I use Spot Instances in Azure Kubernetes Service (AKS)?
Yes. AKS supports Spot nodes via dedicated node pools. You can assign stateless or flexible workloads to these pools using Kubernetes taints, tolerations, and affinity rules—while keeping critical workloads on standard nodes.
What kinds of workloads are best for Azure Spot VMs?
Spot is ideal for flexible, fault-tolerant workloads such as CI/CD pipelines, containerized microservices, batch jobs, dev/test environments, and machine learning model training, especially when checkpointing is in place.
How can I monitor Azure Spot pricing and availability?
Azure doesn’t expose spot pricing history, but you can check availability guidance for VM sizes and regions in their documentation. For deeper yet easy-to-digest cost monitoring, CloudZero gives you real-time insight into usage, savings, and anomalies.
Is it worth using Spot Instances for production?
Not usually. They do not offer high availability or persistent state. Use them to supplement production infrastructure, not replace it.