Table Of Contents
How Cloud GPU Pricing Actually Works H100 GPU Pricing: AWS Vs Azure Vs GCP (2026) A100 GPU Pricing: AWS Vs Azure Vs GCP (2026) Why Pricing Pages Don't Show Your Real GPU cost Cost Per Outcome: The Only GPU Metric That Matters Real-World GPU Cost Scenarios The GPU Pricing Market Is Shifting Fast When To Choose Each Cloud For GPU Workloads How To Track Real GPU Costs Across Clouds Cloud GPU Pricing Comparison FAQs

Quick Answer

Cloud GPU pricing varies across AWS, Google Cloud, and Azure and changes often. As of early 2026, H100 8-GPU instances are commonly priced around $55 to $60 per hour on AWS, about $80 to $90 on Google Cloud, and close to $98 per hour on Azure in U.S. regions. Prices vary by region, configuration, and availability. Discounts, commitments, and spot pricing can reduce costs by 50% or more. Idle GPUs, data transfer, and storage also add up, so the cheapest GPU is not always the cheapest workload.

How Cloud GPU Pricing Actually Works

If you’re evaluating cloud GPU cost for AI workloads, the first thing to understand is this: pricing isn’t just about hourly rates. It’s about how each provider meters, discounts, and commits usage.

AWS bills GPU instances per second (with a one-minute minimum). There are no automatic discounts for sustained usage. To reduce costs, you have to actively use Savings Plans, Reserved Instances, or Spot pricing. This model rewards planning and penalizes reactive usage.

Google Cloud applies sustained-use discounts automatically for many compute workloads, reducing costs as usage increases over the month. It also offers Committed Use Discounts (CUDs) for predictable workloads. However, not all GPU types qualify for sustained-use discounts, making commitment strategies more important for AI training.

Azure prices GPU virtual machines on a pay-as-you-go basis, with optional reserved capacity for one- or three-year terms. Enterprise agreements and credits can reduce effective costs, but discounts depend on contract structure, not automatic usage-based reductions.

These differences matter. Two teams running identical GPU workloads can see major cost variation, not just from pricing, but from how efficiently they use each platform’s discount model.

Understanding pricing mechanics is the first step toward making an informed cloud cost optimization decision.

Here’s how the major discount mechanisms compare:

Discount type

AWS

GCP

Azure

Automatic sustained use

No

Yes (up to ~30%)

No

1-year commitment

~25–45% usual (up to ~72%

~25–45% usual

~25–45% usual (up to ~72%)

3-year commitment

~40–70% (up to ~72%)

~40–70%

~40–70% (up to ~72%)

Spot / preemptible

Up to ~90% off

~60–91% off

Up to ~80–90% off

Enterprise pricing

EDP + Savings Plans

Custom pricing + CUDs

Enterprise Agreements

The takeaway: on-demand rates favor GCP. Spot pricing favors GCP and AWS. Long-term commitments roughly equalize across all three. Azure’s enterprise agreements introduce variables that don’t show up in any published price list.

With that pricing foundation in place, let’s look at the specific GPU models and what they cost across providers.

FinOps In The AI Era: A Critical Recalibration

What 475 executives told us about AI and cloud efficiency.

H100 GPU Pricing: AWS Vs Azure Vs GCP (2026)

The NVIDIA H100 is the standard GPU for large-scale AI training and high-throughput inference in 2026. Here’s how NVIDIA H100 cloud pricing compares across the three major hyperscalers.

Provider

Instance type

GPUs

On-demand (per GPU/hr)

Spot / preemptible (est.)

1-year reserved (est.)

AWS

p5.48xlarge

8x H100

~$6.50–$7.00

~$2.00–$3.00

~$3.50–$4.50

GCP

a3-highgpu-8g

8x H100

~$9.00–$11.50

~$2.50–$4.00

~$5.00–$7.50

Azure

ND96isr H100 v5

8x H100

~$11.00–$13.00

~$3.50–$6.00

~$6.50–$9.00

Sources: AWS EC2, Google Cloud, and Azure VM pricing pages. Rates reflect public on-demand pricing as of 2026. Per-GPU costs are calculated from full instance prices. Actual costs vary by region, configuration, availability, and discount model.

A few things stand out.

GPU cloud pricing varies across providers and shifts frequently based on region, availability, and discounts. As of 2026, H100 GPU cost per hour differs by platform and configuration.

AWS reduced pricing in 2025, bringing its rates closer to Google Cloud. Google Cloud is often among the more cost-effective options for on-demand usage, while Azure can be higher depending on the instance. For example, Azure’s single-GPU NC H100 v5 runs around $6.98 per hour, while its 8-GPU ND H100 v5 instance is closer to $12 per GPU per hour.

Pricing also changes by region, with some locations exceeding $9 to $10 per GPU per hour. For teams evaluating cloud GPU for AI training, on-demand pricing is only the starting point.

Commitment discounts and spot or preemptible pricing can reduce costs by 50% to 90%, making real workload cost very different from list prices.

A100 GPU Pricing: AWS Vs Azure Vs GCP (2026)

The NVIDIA A100 remains a strong choice for mid-scale training, fine-tuning, and inference workloads where the H100’s raw performance isn’t necessary. A100 cloud pricing has dropped considerably as the market shifts toward H100 and H200 instances.

Provider

Instance type

GPUs

On-demand (per GPU/hr)

Notes

AWS

p4d.24xlarge

8x A100 40GB

~$2.70–$2.80

8-GPU only configuration

GCP

a2-highgpu-1g

1x A100 40GB

~$3.60–$3.70

Single-GPU option available

Azure

NC A100 v4

1–4x A100 80GB

~$3.50–$4.20

Flexible GPU configurations

Sources: AWS EC2 pricing page, GCP Compute pricing, Azure VM pricing.

The A100 is worth considering when your workload doesn’t demand the H100’s FP8 support or HBM3 memory bandwidth. For models under 13 billion parameters, the A100 often delivers better cost efficiency because the H100’s advantages in transformer training don’t fully materialize at smaller scales.

One important detail: AWS only offers the A100 in an 8-GPU configuration (p4d.24xlarge), so you’re paying for all eight GPUs even if your workload only uses one or two. GCP and Azure offer single-GPU A100 options, which can be more cost-effective for smaller jobs. This is a meaningful difference when comparing cloud GPU instances across providers.

Why Pricing Pages Don’t Show Your Real GPU cost

This is where most cloud GPU pricing comparison articles stop. They list hourly rates, add a table, and call it a day. But hourly pricing is the least useful metric for understanding what your AI workloads actually cost.

Pricing pages show rates. They don’t show reality.

Here’s what they miss:

  • GPU idle time is the single largest hidden cost in AI workloads. An H100 instance running idle overnight costs the same as one running a training job. CloudZero’s billing data shows that AI spending averaged just 2.5% of total cloud spend across its customer base in late 2025, even as organizations planned 36% budget increases. Much of that gap is idle or underutilized GPU capacity that never shows up as “AI cost” on a cloud bill.
  • Data transfer fees add up fast when training datasets live in a different region or provider than your GPU instances. Moving 1TB of model weights or training data can cost $80-$120 in egress fees alone. Multi-region and multi-cloud AI architectures compound this.
  • Storage costs for training datasets, checkpoints, and model artifacts can quietly outpace compute costs over time. A single large language model (LLM) training run can generate terabytes of checkpoint data that persists long after the training job completes.
  • Spot instance interruptions can waste hours of training progress if your checkpointing strategy isn’t solid. The cost of restarting a failed training run isn’t reflected in the hourly GPU rate, but it hits your budget just the same.
  • Multi-service pipeline overhead is another blind spot. Most AI workloads span compute, storage, networking, and often managed services like vector databases or inference endpoints. The GPU is just one line item in a much larger AI cost stack.

When you add these together, total AI workload cost routinely exceeds the GPU hourly rate by a wide margin. A team comparing providers purely on GPU as a service pricing will miss the cost drivers that determine if a project stays on budget.

As Erik Peterson, co-founder and CTO of CloudZero, put it: “AI experiments in your business are icebergs.” The visible spend — a few hundred dollars on Bedrock or Vertex — is the surface. The real cost sits below the waterline in shared compute, storage, and networking that never gets tagged as AI.

This is why static GPU pricing comparisons create a false sense of clarity. They answer the least important question (“what’s the hourly rate?”) while ignoring the one that actually matters.

Cost Per Outcome: The Only GPU Metric That Matters

The only GPU metric that matters for AI workloads is cost per outcome — cost per training run, cost per million inference tokens, or cost per AI-powered feature — not cost per GPU hour. Hourly pricing tells you almost nothing about what your AI workload will actually cost to run, or what value it will produce.

This framing, which CloudZero calls AI unit economics (the practice of connecting infrastructure spend to the business value it produces), answers the question hourly rates never can: was the spend worth it?

Consider a concrete example: training a mid-sized LLM on A100 GPUs might cost $50,000-$150,000 depending on cluster size, training duration, data transfer, and GPU usage efficiency. The same training job on H100 GPUs might carry a higher hourly rate but finish in half the time, making the total cost lower.

This reframing changes GPU decisions at every level:

  • For training workloads, a faster GPU at a higher hourly rate often delivers lower total cost. An H100 completes transformer-based training jobs 3-6x faster than an A100. If your training job takes three days on A100s but 12 hours on H100s, the H100 path may cost less overall, even at a higher per-hour rate.
  • For inference workloads, cost per request or cost per 1,000 tokens matters more than cost per GPU hour. A GPU running at high throughput and serving thousands of inference requests per second delivers a very different cost-per-outcome than one sitting at 20% usage.
  • For batch processing, the economics of Spot or preemptible instances change the math entirely. A job that tolerates interruptions can run at 50-70% lower cost, making the “most expensive” provider on an on-demand basis potentially the cheapest for your specific use case.
  • For product teams, the most powerful question isn’t “how much does this GPU cost?” It’s “what’s our cost per AI feature, per customer, and does the margin justify the investment?” If you can’t answer that, it doesn’t matter how cheap your GPU rate is. You’re flying blind.

Teams that track cost per outcome, not just cost per hour, consistently make better GPU purchasing decisions. This is a core principle in modern FinOps practice and the foundation of responsible AI spending.

Real-World GPU Cost Scenarios

Abstract pricing comparisons only go so far. Here’s what cloud GPU cost looks like in three common AI workloads, with real numbers.

Scenario 1: Training a 70B parameter LLM

A 70B parameter model training run on 64 H100 GPUs takes roughly 10-14 days of continuous compute. At GCP’s on-demand rate of $3.00 per GPU per hour, that’s approximately $129,000-$181,000 in raw GPU cost alone. On AWS at $3.90 per GPU per hour, the same job runs $167,000-$234,000. On Azure at $6.98, it climbs to $299,000-$419,000.

But the real cost includes data transfer for loading training data across regions (potentially $5,000-$15,000), storage for checkpoints saved every 30-60 minutes ($3,000-$8,000 in high-performance storage), and the risk of a failed run that needs to restart from the last checkpoint. Using Spot instances on GCP or AWS can cut the GPU portion by 40-50%, but adds the overhead of checkpoint management and potential interruptions.

Scenario 2: Running inference at scale

A production inference service handling 10 million requests per day on four H100 GPUs runs approximately $7,200-$16,800 per month on demand, depending on provider. At that volume, cost per request is $0.000024-$0.000056.

The cost driver here isn’t the GPU rate. It’s throughput efficiency. A well-tuned inference pipeline on a single H100 can handle the same traffic that a poorly configured setup handles on four. The difference between a $1,800/month inference bill and a $16,800/month bill often comes down to batching strategy, model quantization, and cost allocation visibility, not which cloud you chose.

Scenario 3: Fine-tuning a 7B model

Fine-tuning a 7B parameter model on a single A100 GPU usually takes 4-8 hours, depending on dataset size and training configuration. At GCP’s $3.67 per hour, that’s $15-$29 per fine-tuning run. On AWS (where A100s only come in 8-GPU packs at $22/hour total post-cut), you’d pay $88-$176 for the same job unless you can run eight fine-tuning jobs in parallel.

This is where instance configuration matters as much as the per-GPU price. GCP and Azure’s single-GPU A100 options are 3-6x cheaper for small-scale fine-tuning compared to AWS’s 8-GPU-only A100 instances. Teams that fine-tune frequently should factor this into their provider selection.

The right question for any GPU workload isn’t “which provider has the lowest rate?” It’s “which provider delivers the lowest total cost for my specific workload?” For a 70B parameter training run, that means accounting for data transfer ($5,000–$15,000), checkpoint storage ($3,000–$8,000), and Spot interruption risk alongside the GPU hourly rate. For inference, it means measuring cost per request, not cost per hour.

The GPU Pricing Market Is Shifting Fast

Before choosing a provider, it helps to understand where GPU pricing is headed. Three forces are reshaping the market in 2026.

1. Hyperscaler price wars have arrived

AWS’s 44% H100 price cut in June 2025 was one of the most aggressive GPU discounts in cloud history, and it followed GCP and Azure making their own adjustments. This pattern mirrors what happened with general-purpose compute a decade ago: as supply catches up with demand, prices compress. Teams that locked into long-term reservations at pre-cut rates may now be paying more than on-demand customers.

2. Next-generation hardware is entering the market

NVIDIA’s H200 and B200 GPUs are already available on select platforms, with 2-2.5x the performance of H100 for LLM training. As these newer instances gain availability, H100 pricing will likely drop further. A100 instances have already fallen below $1/GPU-hour on some marketplace providers.

3. Custom silicon is gaining traction

AWS Trainium instances (trn1) cost roughly $1.34 per chip per hour, about 65% less than an equivalent H100 setup for compatible training workloads. Google TPU v5p shows similar economics. These alternatives won’t replace NVIDIA GPUs entirely, but they’re changing the math for teams willing to invest in framework compatibility.

When To Choose Each Cloud For GPU Workloads

No single provider is best for every AI workload. The right choice depends on your workload profile, commitment tolerance, and existing cloud footprint.

Choose AWS 

If you need the broadest GPU selection and global availability. AWS offers the widest range of GPU instance types, from T4-based inference instances to H100 and the newer B200 (P6) instances. The mid-2025 price cuts made AWS far more competitive, and Spot Instance availability is generally deeper than on other platforms. AWS also supports custom silicon (Trainium, Inferentia) that can cut training and inference costs by 30-50% for compatible workloads.

Choose GCP

If you run sustained, long-duration training jobs. GCP’s automatic sustained-use discounts and competitive H100 pricing make it the lowest-cost option for workloads that run continuously. Google Cloud GPU pricing also benefits from tight integration with TPU instances, which offer an alternative accelerator path for teams willing to move beyond NVIDIA hardware.

Choose Azure

If your organization already operates in the Microsoft ecosystem. Azure’s strength is enterprise integration: co-termed agreements, Azure credits, Microsoft Entra ID, and native connections to Azure Machine Learning. The on-demand rates are higher, but organizations with existing enterprise agreements often pay substantially less than list price.

Consider custom silicon regardless of provider

AWS Trainium and Inferentia, Google TPUs, and Azure’s Maia accelerators are all positioned as cost-effective alternatives to NVIDIA GPUs for specific workloads. These don’t appear in standard NVIDIA GPU cloud pricing comparisons, but they represent some of the largest cost-saving opportunities in GPU compute today.

The decision isn’t just about price. It’s about how each platform shapes your AI cost model over time.

Related: GPU cost attribution for Kubernetes is here. See how CloudZero attributes Kubernetes GPU reservation costs to the workloads that caused them.

How To Track Real GPU Costs Across Clouds

Every article in this space ends with “choose the right instance for your workload.” That’s obvious advice. The harder question is what happens after you choose: how do you know if your GPU investment is paying off, who’s consuming the capacity, and which workloads are burning money while sitting idle?

This is what CloudZero calls the AI attribution problem. GPU and inference costs land in shared compute pools with no tags, no allocation, and no connection to the product, team, or customer that generated them. It’s the root cause of most AI cost overruns, and no pricing page or billing dashboard will surface it.

CloudZero closes that gap by mapping GPU infrastructure spend to the business dimensions that drive decisions: cost per model, cost per customer, cost per feature, cost per deployment, across AWS, Azure, GCP, and third-party AI APIs.

The platform allocates 100% of spend without depending on perfect tagging, which matters because GPU workloads in practice are messy, shared, and fast-moving.

One global SaaS platform with over 40 million users was managing costs across 50+ LLMs with no way to track spend by product, customer, or team. After implementing CloudZero, the team uncovered $1M+ in immediate savings from inference and token caching, and achieved a 50%+ reduction in compute spend. 

to see how teams like Duolingo, Toyota, Grammarly and Moody’s manage AI costs across clouds. You can also get a free cloud cost assessment to find out where your AI spend is going and where the opportunities are.

Cloud GPU Pricing Comparison FAQs

FinOps In The AI Era: A Critical Recalibration

What 475 executives told us about AI and cloud efficiency.