Which cloud provider has the cheapest GPUs for AI?

GCP usually offers the lowest on-demand GPU pricing for H100 instances at approximately $3.00 per GPU per hour. AWS became much more competitive after its June 2025 price cuts (44% reduction on H100 instances), bringing its rate to roughly $3.90 per GPU per hour. Azure remains the most expensive at $6.98 per GPU per hour on demand, though enterprise agreements can reduce this substantially.

How much does it cost to rent an H100 GPU in the cloud?

On-demand H100 rental rates across the three major hyperscalers range from $3.00 to $6.98 per GPU per hour as of April 2026. Spot and preemptible pricing can lower this to $1.95-$2.50 per GPU per hour on AWS and GCP. One-year reserved commitments usually offer 30-40% savings over on-demand rates.

Is GCP always cheaper than AWS for AI workloads?

Not always. GCP is cheaper for long-running, sustained workloads thanks to automatic sustained-use discounts. But AWS can be more cost-effective for bursty or interruptible workloads through deeper Spot Instance availability. The mid-2025 AWS price cuts also narrowed the gap considerably.

What drives GPU costs beyond the hourly rate?

GPU idle time, data transfer fees, storage for training data and checkpoints, multi-service pipeline overhead, and Spot instance interruption costs all contribute to total AI workload spend. These hidden cost drivers can inflate actual GPU costs by 30-60% beyond the published hourly rate.

How do I compare GPU costs across multiple cloud providers?

Start with the normalized per-GPU hourly rate for each provider, then factor in your workload profile (sustained vs. bursty), discount eligibility (reservations, CUDs, enterprise credits), and hidden costs (data transfer, storage, idle capacity). For ongoing management, cloud cost intelligence platforms can unify spend across providers and tie it to business metrics.

Should I use A100 or H100 GPUs for my AI workload?

It depends on model size and workload type. H100 GPUs are 3-6x faster for transformer-based training and offer FP8 support, making them cost-effective for models above 13 billion parameters despite the higher hourly rate. A100 GPUs remain a strong choice for smaller models, fine-tuning, and budget-constrained teams where raw training speed is less critical.

How can I reduce GPU costs without slowing down AI development?

Focus on three areas: right-size your GPU instances to avoid paying for unused capacity, use Spot or preemptible instances for fault-tolerant training jobs with checkpointing, and track cost per AI outcome (not just cost per hour) to identify where spend delivers value and where it doesn't.

Cloud GPU Pricing Comparison: AWS Vs. Azure Vs. GCP For AI

Table Of Contents

How Cloud GPU Pricing Actually Works H100 GPU Pricing: AWS Vs Azure Vs GCP (2026) A100 GPU Pricing: AWS Vs Azure Vs GCP (2026) Why Pricing Pages Don't Show Your Real GPU cost Cost Per Outcome: The Only GPU Metric That Matters Real-World GPU Cost Scenarios The GPU Pricing Market Is Shifting Fast When To Choose Each Cloud For GPU Workloads How To Track Real GPU Costs Across Clouds Cloud GPU Pricing Comparison FAQs

Quick Answer

Cloud GPU pricing varies across AWS, Google Cloud, and Azure and changes often. As of early 2026, H100 8-GPU instances are commonly priced around $55 to $60 per hour on AWS, about $80 to $90 on Google Cloud, and close to $98 per hour on Azure in U.S. regions. Prices vary by region, configuration, and availability. Discounts, commitments, and spot pricing can reduce costs by 50% or more. Idle GPUs, data transfer, and storage also add up, so the cheapest GPU is not always the cheapest workload.

How Cloud GPU Pricing Actually Works

If you’re evaluating cloud GPU cost for AI workloads, the first thing to understand is this: pricing isn’t just about hourly rates. It’s about how each provider meters, discounts, and commits usage.

AWS bills GPU instances per second (with a one-minute minimum). There are no automatic discounts for sustained usage. To reduce costs, you have to actively use Savings Plans, Reserved Instances, or Spot pricing. This model rewards planning and penalizes reactive usage.

Google Cloud applies sustained-use discounts automatically for many compute workloads, reducing costs as usage increases over the month. It also offers Committed Use Discounts (CUDs) for predictable workloads. However, not all GPU types qualify for sustained-use discounts, making commitment strategies more important for AI training.

Azure prices GPU virtual machines on a pay-as-you-go basis, with optional reserved capacity for one- or three-year terms. Enterprise agreements and credits can reduce effective costs, but discounts depend on contract structure, not automatic usage-based reductions.

These differences matter. Two teams running identical GPU workloads can see major cost variation, not just from pricing, but from how efficiently they use each platform’s discount model.

Understanding pricing mechanics is the first step toward making an informed cloud cost optimization decision.

Here’s how the major discount mechanisms compare:

Discount type	AWS	GCP	Azure
Automatic sustained use	No	Yes (up to ~30%)	No
1-year commitment	~25–45% usual (up to ~72%	~25–45% usual	~25–45% usual (up to ~72%)
3-year commitment	~40–70% (up to ~72%)	~40–70%	~40–70% (up to ~72%)
Spot / preemptible	Up to ~90% off	~60–91% off	Up to ~80–90% off
Enterprise pricing	EDP + Savings Plans	Custom pricing + CUDs	Enterprise Agreements

The takeaway: on-demand rates favor GCP. Spot pricing favors GCP and AWS. Long-term commitments roughly equalize across all three. Azure’s enterprise agreements introduce variables that don’t show up in any published price list.

With that pricing foundation in place, let’s look at the specific GPU models and what they cost across providers.

Research Report

FinOps In The AI Era: A Critical Recalibration

What 475 executives told us about AI and cloud efficiency.

H100 GPU Pricing: AWS Vs Azure Vs GCP (2026)

The NVIDIA H100 is the standard GPU for large-scale AI training and high-throughput inference in 2026. Here’s how NVIDIA H100 cloud pricing compares across the three major hyperscalers.

Provider	Instance type	GPUs	On-demand (per GPU/hr)	Spot / preemptible (est.)	1-year reserved (est.)
AWS	p5.48xlarge	8x H100	~$6.50–$7.00	~$2.00–$3.00	~$3.50–$4.50
GCP	a3-highgpu-8g	8x H100	~$9.00–$11.50	~$2.50–$4.00	~$5.00–$7.50
Azure	ND96isr H100 v5	8x H100	~$11.00–$13.00	~$3.50–$6.00	~$6.50–$9.00

Sources: AWS EC2, Google Cloud, and Azure VM pricing pages. Rates reflect public on-demand pricing as of 2026. Per-GPU costs are calculated from full instance prices. Actual costs vary by region, configuration, availability, and discount model.

A few things stand out.

GPU cloud pricing varies across providers and shifts frequently based on region, availability, and discounts. As of 2026, H100 GPU cost per hour differs by platform and configuration.

AWS reduced pricing in 2025, bringing its rates closer to Google Cloud. Google Cloud is often among the more cost-effective options for on-demand usage, while Azure can be higher depending on the instance. For example, Azure’s single-GPU NC H100 v5 runs around $6.98 per hour, while its 8-GPU ND H100 v5 instance is closer to $12 per GPU per hour.

Pricing also changes by region, with some locations exceeding $9 to $10 per GPU per hour. For teams evaluating cloud GPU for AI training, on-demand pricing is only the starting point.

Commitment discounts and spot or preemptible pricing can reduce costs by 50% to 90%, making real workload cost very different from list prices.

A100 GPU Pricing: AWS Vs Azure Vs GCP (2026)

The NVIDIA A100 remains a strong choice for mid-scale training, fine-tuning, and inference workloads where the H100’s raw performance isn’t necessary. A100 cloud pricing has dropped considerably as the market shifts toward H100 and H200 instances.

Provider	Instance type	GPUs	On-demand (per GPU/hr)	Notes
AWS	p4d.24xlarge	8x A100 40GB	~$2.70–$2.80	8-GPU only configuration
GCP	a2-highgpu-1g	1x A100 40GB	~$3.60–$3.70	Single-GPU option available
Azure	NC A100 v4	1–4x A100 80GB	~$3.50–$4.20	Flexible GPU configurations

Sources: AWS EC2 pricing page, GCP Compute pricing, Azure VM pricing.

The A100 is worth considering when your workload doesn’t demand the H100’s FP8 support or HBM3 memory bandwidth. For models under 13 billion parameters, the A100 often delivers better cost efficiency because the H100’s advantages in transformer training don’t fully materialize at smaller scales.

One important detail: AWS only offers the A100 in an 8-GPU configuration (p4d.24xlarge), so you’re paying for all eight GPUs even if your workload only uses one or two. GCP and Azure offer single-GPU A100 options, which can be more cost-effective for smaller jobs. This is a meaningful difference when comparing cloud GPU instances across providers.

Why Pricing Pages Don’t Show Your Real GPU cost

This is where most cloud GPU pricing comparison articles stop. They list hourly rates, add a table, and call it a day. But hourly pricing is the least useful metric for understanding what your AI workloads actually cost.

Pricing pages show rates. They don’t show reality.

Here’s what they miss:

GPU idle time is the single largest hidden cost in AI workloads. An H100 instance running idle overnight costs the same as one running a training job. CloudZero’s billing data shows that AI spending averaged just 2.5% of total cloud spend across its customer base in late 2025, even as organizations planned 36% budget increases. Much of that gap is idle or underutilized GPU capacity that never shows up as “AI cost” on a cloud bill.
Data transfer fees add up fast when training datasets live in a different region or provider than your GPU instances. Moving 1TB of model weights or training data can cost $80-$120 in egress fees alone. Multi-region and multi-cloud AI architectures compound this.
Storage costs for training datasets, checkpoints, and model artifacts can quietly outpace compute costs over time. A single large language model (LLM) training run can generate terabytes of checkpoint data that persists long after the training job completes.
Spot instance interruptions can waste hours of training progress if your checkpointing strategy isn’t solid. The cost of restarting a failed training run isn’t reflected in the hourly GPU rate, but it hits your budget just the same.
Multi-service pipeline overhead is another blind spot. Most AI workloads span compute, storage, networking, and often managed services like vector databases or inference endpoints. The GPU is just one line item in a much larger AI cost stack.

When you add these together, total AI workload cost routinely exceeds the GPU hourly rate by a wide margin. A team comparing providers purely on GPU as a service pricing will miss the cost drivers that determine if a project stays on budget.

As Erik Peterson, co-founder and CTO of CloudZero, put it: “AI experiments in your business are icebergs.” The visible spend — a few hundred dollars on Bedrock or Vertex — is the surface. The real cost sits below the waterline in shared compute, storage, and networking that never gets tagged as AI.

This is why static GPU pricing comparisons create a false sense of clarity. They answer the least important question (“what’s the hourly rate?”) while ignoring the one that actually matters.

Cost Per Outcome: The Only GPU Metric That Matters

The only GPU metric that matters for AI workloads is cost per outcome — cost per training run, cost per million inference tokens, or cost per AI-powered feature — not cost per GPU hour. Hourly pricing tells you almost nothing about what your AI workload will actually cost to run, or what value it will produce.

This framing, which CloudZero calls AI unit economics (the practice of connecting infrastructure spend to the business value it produces), answers the question hourly rates never can: was the spend worth it?

Consider a concrete example: training a mid-sized LLM on A100 GPUs might cost $50,000-$150,000 depending on cluster size, training duration, data transfer, and GPU usage efficiency. The same training job on H100 GPUs might carry a higher hourly rate but finish in half the time, making the total cost lower.

This reframing changes GPU decisions at every level:

For training workloads, a faster GPU at a higher hourly rate often delivers lower total cost. An H100 completes transformer-based training jobs 3-6x faster than an A100. If your training job takes three days on A100s but 12 hours on H100s, the H100 path may cost less overall, even at a higher per-hour rate.
For inference workloads, cost per request or cost per 1,000 tokens matters more than cost per GPU hour. A GPU running at high throughput and serving thousands of inference requests per second delivers a very different cost-per-outcome than one sitting at 20% usage.
For batch processing, the economics of Spot or preemptible instances change the math entirely. A job that tolerates interruptions can run at 50-70% lower cost, making the “most expensive” provider on an on-demand basis potentially the cheapest for your specific use case.
For product teams, the most powerful question isn’t “how much does this GPU cost?” It’s “what’s our cost per AI feature, per customer, and does the margin justify the investment?” If you can’t answer that, it doesn’t matter how cheap your GPU rate is. You’re flying blind.

Teams that track cost per outcome, not just cost per hour, consistently make better GPU purchasing decisions. This is a core principle in modern FinOps practice and the foundation of responsible AI spending.

Real-World GPU Cost Scenarios

Abstract pricing comparisons only go so far. Here’s what cloud GPU cost looks like in three common AI workloads, with real numbers.

Scenario 1: Training a 70B parameter LLM

A 70B parameter model training run on 64 H100 GPUs takes roughly 10-14 days of continuous compute. At GCP’s on-demand rate of $3.00 per GPU per hour, that’s approximately $129,000-$181,000 in raw GPU cost alone. On AWS at $3.90 per GPU per hour, the same job runs $167,000-$234,000. On Azure at $6.98, it climbs to $299,000-$419,000.

But the real cost includes data transfer for loading training data across regions (potentially $5,000-$15,000), storage for checkpoints saved every 30-60 minutes ($3,000-$8,000 in high-performance storage), and the risk of a failed run that needs to restart from the last checkpoint. Using Spot instances on GCP or AWS can cut the GPU portion by 40-50%, but adds the overhead of checkpoint management and potential interruptions.

Scenario 2: Running inference at scale

A production inference service handling 10 million requests per day on four H100 GPUs runs approximately $7,200-$16,800 per month on demand, depending on provider. At that volume, cost per request is $0.000024-$0.000056.

The cost driver here isn’t the GPU rate. It’s throughput efficiency. A well-tuned inference pipeline on a single H100 can handle the same traffic that a poorly configured setup handles on four. The difference between a $1,800/month inference bill and a $16,800/month bill often comes down to batching strategy, model quantization, and cost allocation visibility, not which cloud you chose.

Scenario 3: Fine-tuning a 7B model

Fine-tuning a 7B parameter model on a single A100 GPU usually takes 4-8 hours, depending on dataset size and training configuration. At GCP’s $3.67 per hour, that’s $15-$29 per fine-tuning run. On AWS (where A100s only come in 8-GPU packs at $22/hour total post-cut), you’d pay $88-$176 for the same job unless you can run eight fine-tuning jobs in parallel.

This is where instance configuration matters as much as the per-GPU price. GCP and Azure’s single-GPU A100 options are 3-6x cheaper for small-scale fine-tuning compared to AWS’s 8-GPU-only A100 instances. Teams that fine-tune frequently should factor this into their provider selection.

The right question for any GPU workload isn’t “which provider has the lowest rate?” It’s “which provider delivers the lowest total cost for my specific workload?” For a 70B parameter training run, that means accounting for data transfer ($5,000–$15,000), checkpoint storage ($3,000–$8,000), and Spot interruption risk alongside the GPU hourly rate. For inference, it means measuring cost per request, not cost per hour.

The GPU Pricing Market Is Shifting Fast

Before choosing a provider, it helps to understand where GPU pricing is headed. Three forces are reshaping the market in 2026.

1. Hyperscaler price wars have arrived

AWS’s 44% H100 price cut in June 2025 was one of the most aggressive GPU discounts in cloud history, and it followed GCP and Azure making their own adjustments. This pattern mirrors what happened with general-purpose compute a decade ago: as supply catches up with demand, prices compress. Teams that locked into long-term reservations at pre-cut rates may now be paying more than on-demand customers.

2. Next-generation hardware is entering the market

NVIDIA’s H200 and B200 GPUs are already available on select platforms, with 2-2.5x the performance of H100 for LLM training. As these newer instances gain availability, H100 pricing will likely drop further. A100 instances have already fallen below $1/GPU-hour on some marketplace providers.

3. Custom silicon is gaining traction

AWS Trainium instances (trn1) cost roughly $1.34 per chip per hour, about 65% less than an equivalent H100 setup for compatible training workloads. Google TPU v5p shows similar economics. These alternatives won’t replace NVIDIA GPUs entirely, but they’re changing the math for teams willing to invest in framework compatibility.

When To Choose Each Cloud For GPU Workloads

No single provider is best for every AI workload. The right choice depends on your workload profile, commitment tolerance, and existing cloud footprint.

Choose AWS

If you need the broadest GPU selection and global availability. AWS offers the widest range of GPU instance types, from T4-based inference instances to H100 and the newer B200 (P6) instances. The mid-2025 price cuts made AWS far more competitive, and Spot Instance availability is generally deeper than on other platforms. AWS also supports custom silicon (Trainium, Inferentia) that can cut training and inference costs by 30-50% for compatible workloads.

Choose GCP

If you run sustained, long-duration training jobs. GCP’s automatic sustained-use discounts and competitive H100 pricing make it the lowest-cost option for workloads that run continuously. Google Cloud GPU pricing also benefits from tight integration with TPU instances, which offer an alternative accelerator path for teams willing to move beyond NVIDIA hardware.

Choose Azure

If your organization already operates in the Microsoft ecosystem. Azure’s strength is enterprise integration: co-termed agreements, Azure credits, Microsoft Entra ID, and native connections to Azure Machine Learning. The on-demand rates are higher, but organizations with existing enterprise agreements often pay substantially less than list price.

Consider custom silicon regardless of provider

AWS Trainium and Inferentia, Google TPUs, and Azure’s Maia accelerators are all positioned as cost-effective alternatives to NVIDIA GPUs for specific workloads. These don’t appear in standard NVIDIA GPU cloud pricing comparisons, but they represent some of the largest cost-saving opportunities in GPU compute today.

The decision isn’t just about price. It’s about how each platform shapes your AI cost model over time.

Related: GPU cost attribution for Kubernetes is here. See how CloudZero attributes Kubernetes GPU reservation costs to the workloads that caused them.

How To Track Real GPU Costs Across Clouds

Every article in this space ends with “choose the right instance for your workload.” That’s obvious advice. The harder question is what happens after you choose: how do you know if your GPU investment is paying off, who’s consuming the capacity, and which workloads are burning money while sitting idle?

This is what CloudZero calls the AI attribution problem. GPU and inference costs land in shared compute pools with no tags, no allocation, and no connection to the product, team, or customer that generated them. It’s the root cause of most AI cost overruns, and no pricing page or billing dashboard will surface it.

CloudZero closes that gap by mapping GPU infrastructure spend to the business dimensions that drive decisions: cost per model, cost per customer, cost per feature, cost per deployment, across AWS, Azure, GCP, and third-party AI APIs.

The platform allocates 100% of spend without depending on perfect tagging, which matters because GPU workloads in practice are messy, shared, and fast-moving.

One global SaaS platform with over 40 million users was managing costs across 50+ LLMs with no way to track spend by product, customer, or team. After implementing CloudZero, the team uncovered $1M+ in immediate savings from inference and token caching, and achieved a 50%+ reduction in compute spend.

to see how teams like Duolingo, Toyota, Grammarly and Moody’s manage AI costs across clouds. You can also get a free cloud cost assessment to find out where your AI spend is going and where the opportunities are.

Cloud GPU Pricing Comparison FAQs

Author: Lyne Carolyne

Lyne Carolyne has several years of experience in FinOps and cloud economics and brings that understanding into the content she creates. Outside work, she's an avid explorer.

FinOps In The AI Era: A Critical Recalibration

What 475 executives told us about AI and cloud efficiency.

Any Cost Source, All In One View

FinOps In The AI Era: A Critical Recalibration

Cloud GPU Pricing Comparison: AWS Vs Azure Vs GCP For AI Workloads (2026)

How Cloud GPU Pricing Actually Works

FinOps In The AI Era: A Critical Recalibration

H100 GPU Pricing: AWS Vs Azure Vs GCP (2026)

A100 GPU Pricing: AWS Vs Azure Vs GCP (2026)

Why Pricing Pages Don’t Show Your Real GPU cost

Cost Per Outcome: The Only GPU Metric That Matters