Contents
What actually changed in Claude Opus 4.8? How much does Claude Opus 4.8 cost? Opus 4.8 vs. Sonnet 4.6: Should you use Opus 4.8 or Sonnet 4.6? Opus 4.8 vs. GPT-5.5: How does Opus 4.8 compare to GPT-5.5? How do you get started with Claude Opus 4.8? How does CloudZero track Claude Opus 4.8 costs in real time? Claude Opus 4.8 FAQs

Quick Answer

Claude Opus 4.8 launched May 28, 2026 at $5 input / $25 output per million tokens, same price as Opus 4.7. Four things changed: Dynamic Workflows for parallel agentic coding, Effort Control for managing reasoning depth and cost per call, a Fast Mode now 3x cheaper than Opus 4.7's, and mid-task system messages now out of beta. API model ID: claude-opus-4-8.

Anthropic shipped Claude Opus 4.8 on May 28, 2026, exactly 41 days after Opus 4.7. The SERP was empty for two days after launch. Not because nobody cared. Because engineering managers and finance teams were doing the math on whether the bill changes.

Here is the short answer: the price did not move. The bill might still. Four new features change how Claude runs in production, and some of them quietly multiply your token spend if you do not plan for them. At CloudZero, we track what AI actually costs inside real cloud bills, not what pricing pages say. This is the breakdown we would want before making a model decision.

What actually changed in Claude Opus 4.8?

Anthropic called this release “modest but tangible.” That is honest. The benchmark gains are real. No single number is a revolution. What makes 4.8 worth reading about is four changes that land differently depending on how your team uses Claude. Here is what the headline grabbed and what it buried.

The number nobody talked about: USAMO 2026

Every outlet covered SWE-bench. Almost none covered USAMO.

Claude Opus 4.8 scored 96.7% on the USAMO 2026 math benchmark, up from 69.3% on Opus 4.7. That is a 27.4 percentage point gain in a single 41-day release cycle, the biggest single-cycle math improvement in Opus history. USAMO problems require multi-step proof construction and committing to a solution path before the answer is visible. A 27-point gain in 41 days is not a small tweak. It suggests something changed structurally in how the model reasons through hard problems.

Why does that matter for a cloud cost platform? Because the same reasoning depth that helps with math olympiad problems also helps with complex cost attribution across hundreds of AWS services, financial modeling across large variable sets, and multi-layer architectural decisions. 

CloudZero customers doing this kind of work are the teams most likely to notice the improvement in practice, well before any benchmark leaderboard captures it.

Dynamic workflows: Powerful, and expensive if you do not plan

Dynamic Workflows is the biggest engineering feature in this release. One orchestrator agent breaks a task into pieces, spins up hundreds of parallel sub-agents, and merges their results inside a single Claude Code session. No external infrastructure needed. The full mechanics are covered in the llm-stats.com release breakdown.

For codebase migrations, large security reviews, and parallel test generation, this genuinely changes what a single session can do. Files that used to run sequentially now run in parallel. That is real.

What Anthropic did not say loudly enough: each sub-agent runs its own token count. A 50-agent session costs approximately 50x the tokens of a single-agent equivalent. A $50 single-agent job can become a $2,500 bill as a Dynamic Workflow. That is not a reason to avoid it. It is a reason to model your sub-agent count before you run it in production, not after the invoice arrives.

CloudZero tracks Dynamic Workflow sessions by workspace so a $2,500 token event fires an alert to the team that owns it before finance sees it. For how Claude Code agents and session costs differ from standard API calls, that guide covers it properly.

Mid-task system messages: One less workaround

Mid-conversation system message updates on the Messages API are now generally available, no beta flag required. Claude’s instructions can update during a long session without losing the context already built.

A code review that starts as “check for security issues” can pivot to “focus on performance” mid-session without resetting. For teams running Claude in multi-step pipelines with branching logic, this removes a workaround most had already built custom tooling around. One less wrapper to maintain, and one less place where a context reset quietly inflates token spend.

Fast mode: 3x cheaper than it was

Fast Mode at 2.5x speed existed on Opus 4.7. What is new in 4.8 is the price: $10 input / $50 output per million tokens, 3x cheaper than Opus 4.7’s Fast Mode. That changes the math for latency-sensitive workloads significantly.

Databricks ran Opus 4.8 Fast Mode in their Genie AI agent and reported 61% cheaper token costs versus Opus 4.7. That is an enterprise-scale production result, not a pricing page estimate. For real-time code analysis, customer-facing AI features, and high-volume document review where speed matters, Fast Mode now makes a much stronger cost argument. It is optional and not default. Enable it explicitly in the API.

Effort control: The cost dial everyone else called a quality feature

Every other outlet described Effort Control as a quality setting. CloudZero calls it what it is: a cost lever.

Four settings: low, high (default), extra/xhigh, and max:

  • High uses roughly the same tokens as Opus 4.7 default but with better output
  • Low uses fewer tokens and responds faster
  • Max uses more of both

The ability to send routine calls at low effort while saving max for genuinely complex tasks means a team can use Opus 4.8 across a wide range of work without paying full price on every call.

CloudZero thinks of this the same way we think about EC2 rightsizing. Running an m5.4xlarge on every workload regardless of what it actually needs is the expensive mistake every ops team eventually fixes. Running Opus 4.8 at max effort on a summarization task is the same mistake, priced in tokens instead of instance hours. Teams that set effort levels thoughtfully spend significantly less for the same output quality overall.

CloudZero’s Claude Code plugin shows Effort Control usage by workspace and team so when someone switches from high to max across 10,000 daily calls, the cost change is visible before the invoice arrives.

How much does Claude Opus 4.8 cost?

The pricing page held flat. Whether your actual spend holds flat depends on three things Anthropic does not print on the pricing page.

The pricing table

ModeInput per 1M tokensOutput per 1M tokensSpeed
Standard$5.00$25.00Baseline
Fast Mode (optional$10.00$50.002.5x faster

Context window: 1M input / 128K output. On by default, no API flag needed.

Three things that will affect your actual bill

Upgrading from Opus 4.6 directly? A tokenizer change introduced in 4.7 can increase effective token counts by up to 35% for some text types. Teams already on 4.7 see no change. Teams jumping from 4.6 who skip benchmarking their real prompts are the ones who discover this mid-month.

  • Effort Control defaults to high. The listed rates assume that. Move to max effort across your whole Opus setup and you spend above what the table implies, quietly, until you check. Intentional routing matters here.
  • Dynamic Workflows scale with sub-agent count. A 50-agent session does not cost $25 per million output tokens. It costs $25 per million output tokens across 50 simultaneous token streams. Plan the count before production. This really is not optional.

For the full picture on cost levers including prompt caching (up to 90% off cached input), the Batch API (50% off all requests), and model routing, the Anthropic Claude API pricing guide covers every mechanism. For Claude subscription plan pricing across Free, Pro, Max, Team, and Enterprise, the Claude pricing guide has everything.

Opus 4.8 vs. Sonnet 4.6: Should you use Opus 4.8 or Sonnet 4.6?

Opus 4.8 is better on the benchmarks that matter for complex work. That part is not in question. What is in question is whether that quality difference justifies $22 more per million output tokens for your specific tasks. Most teams answer this wrong the same way: they default to Opus because it scores higher, without checking whether the difference shows up in their actual outputs.

When Opus 4.8 makes sense

  • Serious agentic coding. Resolving 69.2% of hard real-codebase tasks autonomously means Opus 4.8 gets through nearly 7 in 10 complex engineering jobs without human intervention. For migrations, multi-file refactors, and architectural work where a bad pass creates hours of cleanup, paying more per token on the first pass is net cheaper than paying less per token on three passes. Dynamic Workflows makes this case even stronger.
  • Computer use tasks. 83.4% on OSWorld is a 7-point lead over Gemini 3.1 Pro and a 4.7-point lead over GPT-5.5. For work that involves navigating GUIs or coordinating across applications, Opus 4.8 is the strongest option at the frontier right now.
  • Complex financial and reasoning tasks. A 27-point USAMO jump is a reasoning story, not a math story. Cost attribution across hundreds of cloud services, financial modeling with many interdependent variables, architectural tradeoff analysis with unclear constraints: these are the tasks where Opus 4.8 at max effort produces conclusions that are actually different, not just more confidently stated. CloudZero’s own analysis of how enterprise teams spend on AI increasingly points to Opus for these specific tasks and Sonnet for everything else.
  • Selective routing at max effort. The best Opus 4.8 setup is not “use Opus for everything.” It is “use Opus at max effort for the 15% of tasks where that depth genuinely changes the result, and use Sonnet for the rest.” Teams that do this deliberately spend a small fraction of what teams running Opus as a blanket default spend, with nearly identical overall output quality.

When Sonnet 4.6 Is the Right Answer

Claude Sonnet 4.6 costs $3 input / $15 output per million tokens and handles content generation, summarization, standard coding help, RAG responses, classification, and data extraction at roughly 85% of Opus-level quality.

For most of what production AI systems do most of the time, that 15% quality gap does not change the business result. It changes the invoice.

Monthly OutputOpus 4.8Sonnet 4.6Monthly Difference
100M tokens$2,500$1,500$1,000
500M tokens$12,500$7,500$5,000
1B tokens$25,000$15,000$10,000

At 1 billion output tokens per month, defaulting to Opus on tasks Sonnet handles just as well costs $10,000 per month. That is not a rounding error. That is a headcount conversation in a Q3 budget review. For how session-based Claude Code costs behave differently from structured API calls, and why Sonnet is often the better default there, the Claude Code pricing guide explains it clearly.

Opus 4.8 vs. GPT-5.5: How does Opus 4.8 compare to GPT-5.5?

Opus 4.8 and GPT-5.5 are both priced at $5 input / $25 output per million tokens. That is a genuine shift. Eighteen months ago there was real price spread at the top tier across providers. Now there is not. At price parity, the model choice is entirely about which performs better on your specific tasks.

Where Opus 4.8 leads: SWE-bench Pro at 69.2% vs. 58.6%, GDPval-AA at 1,890 vs. 1,769 Elo, and OSWorld at 83.4% vs. 78.7%, per Digital Applied’s benchmark analysis. Where GPT-5.5 leads: Terminal-Bench 2.1 at 78.2% versus 74.6%. For GPT-5.5 API costs in full, our OpenAI pricing guide covers it.

Most enterprise teams in 2026 run multiple models. Opus 4.8 for complex coding and reasoning. Sonnet 4.6 for production volume. GPT-5.5 for terminal-heavy CLI tasks. Gemini 3.1 Pro at roughly $2/$12 per million tokens when cost is the primary constraint: our Gemini pricing guide covers where that value case holds. The question is never which model wins overall. It is which model wins on your task, and whether you can actually see what each one costs you per result rather than per token. And if your team is comparing Claude subscription plans against ChatGPT’s plan pricing rather than pure API costs, that is a very different comparison than the per-token math.

How do you get started with Claude Opus 4.8?

API model ID: claude-opus-4-8

Where it runs: Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, GitHub Copilot , and Claude Pro, Max, Team, and Enterprise plans.

Coming from Opus 4.7? Change one line. The tokenizer is the same, so your token budgets carry over without recalibration.

Coming from Opus 4.6? Same one-line change, but run your real prompts through first. The tokenizer change introduced in 4.7 can push effective token counts up 35% for some text types. The pricing page will not warn you. Your first bill will.

Effort Control: Default is high. Low, extra/xhigh, and max are available via API parameter. Start at default, measure whether quality changes matter for your tasks, then route accordingly. Setting max universally is expensive and usually unnecessary.

Fast Mode: Off by default. Enable it explicitly. The Databricks result of 61% cheaper token costs came from thoughtful setup, not flipping a default.

Using Claude Code or Cursor? Session-based costs in Claude Code behave very differently from structured API calls. If your team uses Cursor alongside Claude Code, CloudZero’s Cursor AI pricing guide shows where each tool’s cost profile makes more sense by task type.

Running Claude through AWS? The Claude on AWS and Amazon Bedrock guide covers all three access paths: Bedrock, Claude Platform on AWS, and the direct Anthropic API, with cost and feature differences for each. For Google Cloud, the Google Vertex AI pricing guide has current rates and access details.

How does CloudZero track Claude Opus 4.8 costs in real time?

The pricing page tells you what a token costs. CloudZero tells you what your AI decisions cost.

CloudZero was the first cloud cost platform to integrate directly with Anthropic. That means Opus 4.8 spend is tracked the same way your EC2 or RDS costs are: by model version, Fast Mode status, Effort Control setting, workspace, and API key, then mapped to teams, products, features, customers, and environments. When a Dynamic Workflow session fans out to 80 sub-agents and generates a $3,000 token event, the alert goes to the team that owns it before anyone in finance sees a number.

All of that attribution flows into the same platform used for AWS, Azure, GCP, OpenAI, Databricks, Snowflake, and every other provider in your stack. AI spend stops being an opaque monthly number and starts being something you can actually act on. Unlike your EC2 fleet, your AI model mix is changing every 41 days. Teams that can see what each model costs per business outcome adjust before the quarterly review. Teams that cannot see it explain the number at the quarterly review.
Schedule a demo to see how CloudZero surfaces Opus 4.8 costs across every access path in real time.

Claude Opus 4.8 FAQs