What is token based pricing for AI?

Token based pricing charges based on the number of tokens (roughly words or word fragments) an AI model processes. OpenAI, Anthropic, and Google all charge separately for input tokens (the prompt) and output tokens (the response), with output costing 3-4x more. Costs scale directly with usage volume and prompt complexity.

Why are AI costs so hard to predict?

AI costs depend on usage patterns that are inherently difficult to forecast: how many prompts employees send, how long those prompts are, and how quickly AI-powered features get adopted. Consumption-based pricing means a popular feature can go from $500/month to $50,000/month as usage grows.

Is it cheaper to use AI APIs or self-host open-source models?

For low-to-moderate usage, AI APIs are almost always cheaper because you avoid GPU infrastructure costs entirely. At high volumes (millions of inferences per day), self-hosting open-source models like Llama or Mistral reduces per-inference costs substantially, but you absorb GPU compute ($40,200/month for a single H100 cluster), model operations, and engineering overhead. The break-even depends on model size, request volume, and GPU utilization.

What AI pricing strategy should enterprises use for budgeting?

Start by mapping all AI cost sources (SaaS subscriptions, API consumption, GPU infrastructure, shadow AI) into a single view. Set per-team budgets with real-time alerts for consumption spikes. Route simple tasks to cheaper models and reserve frontier models for complex reasoning. Track cost per business outcome, not aggregate spend. CloudZero's budgeting and anomaly detection provide the visibility layer that makes pricing strategies actionable for AI workloads.

June 16, 2026 12 min read

AI pricing explained: what AI actually costs and how providers charge for it in 2026

Q: How much does AI cost?

AI cost ranges from free (ChatGPT free tier, Gemini free tier) to hundreds of thousands per month for enterprise deployments. A ChatGPT Plus subscription costs $20/month. Microsoft Copilot Enterprise costs $30/user/month on top of a required M365 base license. API-based production applications scale with token volume: 10 million input tokens per day on GPT-4o costs roughly $750/month in input fees alone, with output costs adding 3-4x more.

By Lyne Carolyne // AI Content Specialist

Contents

What does AI pricing actually mean? How the major AI providers price their services 6 AI pricing models every buyer should understand The hidden costs of AI that pricing pages do not show you How CloudZero resolves AI pricing chaos Frequently Asked Questions About AI Pricing

Quick Answer

AI pricing covers the cost structures and billing models providers use to charge for AI products: per-token APIs (GPT-4o at $2.50/1M input tokens), per-seat subscriptions (Copilot at $30/user/month), per-conversation billing (Agentforce at $2/conversation), and consumption-based GPU compute (H100 instances at $55.04/hour). There is no standard. The total AI cost is almost always higher than the sticker price.

OpenAI charges per token. Microsoft charges per seat. AWS charges per GPU-hour. Salesforce charges per conversation. Anthropic charges per million tokens with different rates for input and output. And the engineering team that just spun up a fine-tuning job on SageMaker is about to learn what “consumption-based” means when the invoice arrives. (Spoiler: it means “more than you thought.”)

AI pricing has no standard. Gartner forecasts worldwide AI spending will reach $2.5 trillion in 2026, and every dollar of it is priced through a different model. For companies adopting AI across the organization, the question is not just “how much does AI cost?” It is “how do I compare costs across providers when one bills by the word, another by the seat, and a third by the conversation?”

CloudZero processes more than $15 billion in managed cloud and AI spend and helps engineering and finance teams understand what AI actually costs at the unit level. This guide covers the six AI pricing models that dominate the market, what the major providers actually charge, the hidden costs pricing pages leave out, and how to manage AI cost when every vendor bills differently.

One clarification: this article covers the cost of AI, not the use of AI for product pricing optimization (sometimes called “AI-powered pricing” or “dynamic pricing”). For that, Buynomics has a comparison. This guide is about what AI costs to buy, build, and run.

What does AI pricing actually mean?

AI pricing is the collection of cost structures, billing models, and metering approaches that AI providers use to charge for their products and services. Unlike traditional software (one license, one price, one invoice), AI pricing typically combines multiple billing dimensions in a single product: tokens consumed, compute hours used, data processed, features accessed, and seats provisioned.

What makes AI pricing uniquely complex is that it spans two layers most guides only cover one side of:

The application layer includes SaaS subscriptions (ChatGPT Plus at $20/month), API credits (from $0.15 to $15.00 per million tokens depending on provider and model tier), per-seat fees (Copilot at $18-$30/user/month), and outcome-based charges (Salesforce Agentforce, Intercom Fin). This is the visible ai cost: the vendor invoice, the expense report, the line item someone approved.

The infrastructure layer includes GPU compute at $1-$55/hour, model training runs, data pipelines, vector databases, and self-hosted inference. This is the invisible AI cost: buried in the cloud bill under generic compute and storage. CloudZero’s ROI in the AI Era report found organizations budget 30-36% of cloud spend for AI, while AI-specific line items show up at just 2.5%. The other 97.5% is ghost spend nobody tracks.

Generative AI pricing and AI service pricing in particular combine both layers in ways traditional software never did. Understanding where the costs live is step one. Knowing what the major providers actually charge is step two.

Report

Finance needs to prove AI’s return: CloudZero report

260 senior finance leaders (more than half CFOs) told us why the speed of seeing AI spend, not the size of it, separates who pulls ahead on AI from who gets burned.

Read the report

How the major AI providers price their services

Every major AI provider uses a different pricing model:

Provider	Pricing model	Consumer/entry	Enterprise/API	Primary cost drive
OpenAI	Per-token + subscription	ChatGPT Free / Plus $20/mo	GPT-4o: $2.50/$10.00 per 1M tokens	Token volume, model tier
Anthropic	Per-token + subscription	Claude Free / Pro $20/mo	Sonnet 4.6: $3.00/$15.00 per 1M tokens	Token volume, caching efficiency
Google	Per-token + Vertex AI	AI Studio Free / Advanced $20/mo	2.5 Pro: $1.25/$10.00 per 1M tokens	Token volume, Vertex compute
Microsoft	Per-seat subscription	Copilot Business $18/user/mo	Enterprise $30/user/mo + M365 base	Seat count (usage-independent)
AWS	Consumption-based	Bedrock per-token	SageMaker H100 at $55.04/hr	GPU type, training duration
Salesforce	Per-conversation / Flex Credits	Foundations (Free)	$2/conversation or $0.10/action	Conversation/action volume

For complete breakdowns, see CloudZero’s guides to ChatGPT pricing, Claude API pricing, OpenAI API pricing, and Gemini API pricing. For a token-by-token comparison across all major LLMs, see CloudZero’s LLM API pricing comparison.

The table makes one thing clear: AI pricing comparison across providers is not a spreadsheet exercise. It requires normalizing fundamentally different billing models into common units. This is why organizations working across multiple cloud service providers need a platform that can ingest costs from all of these sources into a single view. CloudZero does this by pulling data from Anthropic, OpenAI, AWS, GCP, Azure, and other providers into one normalized cost model.

Now that the providers are mapped, the next question is which pricing model you are dealing with and what it means for your budget.

6 AI pricing models every buyer should understand

Several pricing models dominate the AI market. Each creates a different cost management challenge, and understanding which one you are dealing with determines how accurately you can forecast spend.

Model	How you’re billed	Example	Forecasting difficulty
Per-token	Input + output tokens processed	GPT-4o $2.50/$10.00 per 1M	High — scales with usage
Per-seat	Flat monthly fee per user	Copilot $18-$30/user/mo	Low — fixed, but usage-blind
Consumption (compute)	GPU time by the hour	H100 at $55.04/hr	High — depends on uptime
Per-conversation/resolution	Each completed interaction	Agentforce $2/conversation	Medium — volume-dependent
Tiered / freemium	Free tier + paid capacity	ChatGPT Free/Plus/Pro	Medium — upgrade creep
Hybrid	Base fee + usage overages	Subscription + per-token	Highest — floor plus moving ceiling

Per-token pricing. The dominant model for AI API pricing. You pay based on input and output tokens processed. Used by OpenAI, Anthropic, Google, and most LLM pricing providers. Rates range from $0.15/1M tokens (GPT-4o mini) to $3.00/1M tokens (Claude Sonnet 4.6) for input, with output tokens costing 3-4x more. This is token based pricing in practice: costs scale linearly with usage, which means a popular AI feature can double your bill in a month without anyone changing a setting. CloudZero’s direct Anthropic and OpenAI integrations pull token-level consumption into the same cost view as cloud infrastructure, so teams can see exactly which models drive which costs.
Per-seat subscription. Fixed monthly cost per user, regardless of usage. Microsoft Copilot at $18-$30/user/month is the canonical example. Also used by Notion AI, Grammarly, and most AI-embedded SaaS. AI subscription pricing is predictable for budgeting but disconnected from actual value: you pay the same if the user prompts Copilot 100 times a day or lets it gather dust. As Erik Peterson, CloudZero’s founder and CTO, puts it: “The cloud should drive innovation, not headaches. But you can’t innovate efficiently on costs you can’t see.” Per-seat pricing hides the usage signal that tells you if the investment is working.
Consumption-based (compute hours). Pay for GPU infrastructure by the hour. Used by AWS SageMaker, Google Vertex AI, and Azure ML. AI infrastructure pricing ranges from $1/hour for inference-optimized chips to $55.04/hour for NVIDIA H100 clusters. This is also how AI chatbot cost scales when running self-hosted inference. Cost driver: GPU type, training duration, and if you remembered to shut down the instance over the long weekend. (The instance remembers. It kept running. So did the meter.)
Per-conversation or per-resolution. Pay per completed interaction. Salesforce Agentforce at $2/conversation, Intercom Fin at $0.99/resolution. This model aligns cost with business outcomes, which sounds elegant until you try to forecast how many conversations your AI will handle next quarter. Salesforce already offers three different pricing models for Agentforce (per-conversation, Flex Credits, per-user), because even the vendor building the product could not settle on one.
Tiered pricing / freemium. Free tier with limited usage, paid tiers adding more capacity. Used by ChatGPT (Free/Plus/Pro), Claude (Free/Pro/Team), and most consumer-facing AI. This is where shadow AI begins: individual contributors start on free, usage grows, and suddenly the organization has 200 individual subscriptions on personal credit cards. Expense-based SaaS purchasing grew 267% year-over-year, according to Zylo’s 2026 SaaS Management Index. That is shadow AI entering through the expense report, one $20 subscription at a time.
Hybrid pricing. Combines a base subscription with variable usage charges. Hybrid pricing surged from 27% to 41% of B2B companies in just 12 months, according to Growth Unhinged’s 2025 State of B2B Monetization report. Example: a platform charges a monthly fee plus per-token overages above a threshold. This is the hardest model to forecast: the bill has a fixed floor and a variable ceiling, and the ceiling moves with adoption. More vendors are shifting to it as they mature.

These six models answer “how is AI priced?” But the sticker price is only the beginning. Here is what it leaves out.

The hidden costs of AI that pricing pages do not show you

The pricing table shows the part of the iceberg above the waterline. The actual cost of generative AI in production includes five categories of spend:

GPU infrastructure. Fine-tuning models requires GPU compute at $1-$55+ per hour. Even organizations using APIs for inference often train custom models or run open-source alternatives. AI server cost for a single H100 cluster running 24/7 is $40,200/month. CloudZero’s cloud GPU pricing comparison breaks this down by instance type across AWS, GCP, and Azure.
Data pipelines. Cleaning, labeling, embedding, and indexing data into vector databases often costs more than the model itself. Databricks and Snowflake costs sit under “data infrastructure” on the cloud bill. CloudZero’s ROI in the AI Era report found organizations’ reported AI spending is roughly 12x lower than actual AI-driven cloud consumption. The gap is data pipelines and compute that exist because of AI but never get attributed to it.
Shadow AI. ChatGPT is now the number-one most expensed application on corporate credit cards, according to Zylo. Eight of the top 50 most-expensed applications are AI-native. This is spending that procurement cannot see, IT cannot govern, and finance cannot forecast. It is the AI equivalent of every department buying their own printer in 1998, except each printer costs $20/month and multiplies when nobody is looking.
Integration overhead. RAG pipelines, prompt libraries, evaluation frameworks, multi-model orchestration: all require engineering time and infrastructure that never appears on a pricing page.
Surprise bills. 78% of IT leaders experienced unexpected charges from consumption-based AI pricing, according to Zylo, and 61% had to cut other projects as a result. 61% had to cut other projects as a result. A misconfigured pipeline, a runaway training job, or a context window nobody optimized can generate a month’s budget in a weekend. CloudZero’s anomaly detection catches these spikes within hours. Fintech company Upstart used CloudZero’s anomaly alerts to cut total cloud spend by over $16 million. Drift saved $2.4 million in annual AWS costs.

The hidden cost layer is where most AI budget surprises originate. Managing it requires a different approach than reading pricing pages.

How CloudZero resolves AI pricing chaos

CloudZero provides an AI ROI solution: understanding what AI costs per team, per feature, per customer, across every provider and billing model. How?

First-ever Anthropic integration. CloudZero was the first cloud cost platform to integrate directly with Anthropic’s Usage and Cost API, pulling Claude token consumption, model usage, and caching efficiency into the same view used for AWS, Azure, and GCP. Teams building with Claude see spend broken down by project, model, operation, and workspace.
OpenAI with per-user granularity. CloudZero’s OpenAI integration ingests both cost and usage data, enabling cost per user, per model, per token type. No more reconciling three billing dashboards to figure out which model costs what.
Multi-provider normalization. CloudZero ingests data from AWS, GCP, Azure, Anthropic, OpenAI, Kubernetes, Snowflake, Datadog, MongoDB, New Relic, and any other source through the AnyCost API. All normalized into one cost model. All mappable to business dimensions (teams, features, products, customers). This is AI cost management at scale.

Unit economics for AI. CloudZero’s cost per customer capability answers the question pricing pages never do: what does each AI-powered feature cost per customer interaction? The difference between “AI costs $200K/month” and “the search feature costs $0.003 per query for profitable customers and $0.14 for unprofitable ones” is the difference between a number and a decision.
AI-powered anomaly detection. CloudZero’s anomaly detection compares 36 hours of hourly spend against 12 months of history, then alerts the team that owns the affected workload. Not a monthly finance report. Not a weekly rollup. Hourly data, because at $55/hour, every hour counts.

Budgets that work for consumption pricing. CloudZero’s budgeting tracks spend against plans at the team and product level. When AI API pricing costs trend toward a breach, the team that owns the spend knows before the bill arrives.

For AI cost optimization strategies (commitment planning, model right-sizing, prompt optimization), see CloudZero’s guide to AI cost optimization. For the full AI cost management framework, see CloudZero’s AI cost management guide.

Organizations like Toyota, Duolingo, Coinbase, Upstart, Drift, and Skyscanner use CloudZero to manage more than $15 billion in cloud and AI spend. Ready to see what your AI spend actually looks like? .

Frequently Asked Questions About AI Pricing

Author Spotlight

Lyne Carolyne

Lyne Carolyne has several years of experience in AI and cloud economics and brings that understanding into the content she creates. Outside work, she's an avid explorer.