pricingApr 10, 202612 min read

Anthropic Claude Pricing in 2026: Every Model, Every Discount, Every Hidden Cost

From Haiku at $1/M to Opus at $75/M — the complete breakdown of what Claude actually costs, including the cache discounts and surcharges Anthropic doesn't make obvious.


A developer recently told us their Claude bill tripled after they turned on extended caching. They expected it to save money. It did — on reads. But the 2x write surcharge on the 1-hour TTL ate those savings alive.

This is the problem with Anthropic's pricing: the headline numbers are straightforward, but the actual cost depends on a maze of multipliers that most developers never see until they check the invoice.

This guide breaks down every Claude model's real cost in 2026 — base rates, cache pricing, thinking tokens, batch discounts, fast mode surcharges, and web tool fees. No marketing fluff. Just the numbers you need to budget correctly.

The Claude Model Lineup (April 2026)

Anthropic currently offers three tiers across two generations:

ModelInput (per 1M tokens)Output (per 1M tokens)Context WindowBest For
Claude Opus 4.6$5.00$25.00200KComplex reasoning, research, code architecture
Claude Opus 4.5$5.00$25.00200KSame capabilities, previous generation
Claude Sonnet 4.6$3.00$15.00200KBalanced performance/cost, production workloads
Claude Sonnet 4.5$3.00$15.00200KPrevious gen, still capable
Claude Haiku 4.5$1.00$5.00200KHigh-volume, latency-sensitive tasks

Legacy models (still available)

ModelInput (per 1M tokens)Output (per 1M tokens)Context WindowStatus
Claude Opus 4.1 / 4$15.00$75.00200KActive
Claude 3.5 Sonnet$3.00$15.00200KActive
Claude 3.5 Haiku$0.80$4.00200KActive
Claude 3 Opus$15.00$75.00200KActive
Claude 3 Haiku$0.25$1.25200KRetiring April 19

Key insight: Claude Opus 4.6 is 3x cheaper than the original Claude 3 Opus ($5 vs $15 input, $25 vs $75 output) while being significantly more capable. If you're still on Claude 3 Opus, you're paying 3x more for less.

Haiku 3 Retirement: What You Need to Know (April 19, 2026)

Anthropic is retiring Claude 3 Haiku on April 19 — nine days from now. If your production code references claude-3-haiku-20240307, it will stop working.

The migration path is Claude Haiku 4.5, which means a 4x price increase:

Claude 3 Haiku (retiring)Claude Haiku 4.5 (replacement)
Input$0.25/M$1.00/M
Output$1.25/M$5.00/M
Max output4K tokens64K tokens
Extended thinkingNoYes

The price jump is real, but Haiku 4.5 is a significantly more capable model — longer outputs, better instruction following, and thinking support. If your Haiku 3 workload is simple classification or routing, consider whether GPT-4.1-nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40) can handle it at a fraction of the cost instead.

Action item: Search your codebase for claude-3-haiku and claude-3-haiku-20240307 before April 19. Track the cost impact of your migration with AISpendGuard.

The Cache Pricing That Changes Everything

Anthropic's cache pricing is the most aggressive in the industry — and the most complex. Here's how it actually works:

Cache Read: 90% Discount

When Claude hits a cache on your prompt prefix, you pay just 10% of the normal input price.

ModelNormal InputCached Read
Opus 4.6$5.00/M$0.50/M
Sonnet 4.6$3.00/M$0.30/M
Haiku 4.5$1.00/M$0.10/M

That's a 0.1x multiplier — meaning cached Haiku reads cost just $0.10 per million tokens. At that price, Haiku becomes cheaper than most open-source model hosting.

Cache Write: The Surcharge Nobody Expects

Here's where it gets tricky. Anthropic charges a surcharge on cache writes — the first time your prompt prefix is stored:

Cache TTLWrite MultiplierOpus 4.6 Write CostSonnet 4.6 Write Cost
5 minutes (default)1.25x$6.25/M$3.75/M
1 hour (extended)2.0x$10.00/M$6.00/M

The trap: If your cache hit rate is low (below ~60%), the write surcharge can actually make caching more expensive than not caching at all. The 1-hour TTL is especially dangerous — at 2x the normal input cost, you need consistent repeat calls within that hour to break even.

When Caching Pays Off (Real Math)

Let's say you're running Sonnet 4.6 with a 4,000-token system prompt, 100 calls per hour:

Without caching:

  • 100 calls × 4,000 tokens × $3.00/M = $1.20/hour

With 5-minute TTL caching (assuming 80% hit rate):

  • 20 cache writes × 4,000 tokens × $3.75/M = $0.30
  • 80 cache reads × 4,000 tokens × $0.30/M = $0.096
  • Total: $0.396/hour (67% savings)

With 1-hour TTL caching (assuming 95% hit rate):

  • 5 cache writes × 4,000 tokens × $6.00/M = $0.12
  • 95 cache reads × 4,000 tokens × $0.30/M = $0.114
  • Total: $0.234/hour (81% savings)

The takeaway: caching is powerful, but only if your hit rate justifies the write cost. Track both metrics, not just "caching is enabled."

How Claude Compares to OpenAI and Google

Cache Read Discounts: Provider Comparison

ProviderCache Read MultiplierNotes
Anthropic0.1x (90% off)Best discount, but has write surcharges
Google0.1x (90% off)Matches Anthropic, no write surcharge
OpenAI (GPT-4.1, o3, o4)0.25x (75% off)Automatic, no write surcharge
OpenAI (other models)0.5x (50% off)Automatic, no write surcharge

Anthropic and Google tie on read discounts. But Anthropic's write surcharge means Google's caching is actually cheaper in practice for low-hit-rate workloads.

Tier-for-Tier Price Comparison

TierAnthropicOpenAIGoogle
FlagshipOpus 4.6: $5/$25GPT-4.1: $2/$8Gemini 2.5 Pro: $1.25/$10
Mid-tierSonnet 4.6: $3/$15GPT-4o: $2.50/$10Gemini 2.5 Flash: $0.30/$2.50
BudgetHaiku 4.5: $1/$5GPT-4.1-nano: $0.10/$0.40Gemini 2.0 Flash: $0.10/$0.40

The honest assessment: Anthropic is the most expensive provider at every tier. Opus 4.6 costs 2.5x more than GPT-4.1 and 4x more than Gemini 2.5 Pro for input. At the budget tier, Haiku 4.5 at $1/$5 is 10x more expensive than GPT-4.1-nano or Gemini 2.0 Flash at $0.10/$0.40.

The question isn't whether Claude is more expensive — it is. The question is whether the quality difference justifies the premium for your specific use case.

Thinking Tokens: The Output Cost You Don't See

Claude's reasoning models (extended thinking) generate thinking tokens that are billed at the output token rate — and they don't appear in the final response.

For Opus 4.6 with extended thinking enabled, a response that generates 500 visible output tokens might consume 2,000 thinking tokens behind the scenes. Your actual output cost:

  • Visible output: 500 tokens × $25/M = $0.0125
  • Thinking tokens: 2,000 tokens × $25/M = $0.05
  • Real cost: $0.0625 (5x what you expected)

At $25 per million output tokens, those invisible thinking tokens add up fast. If you don't need extended thinking for a task, turn it off. There's no reason to pay for reasoning on a classification or extraction task.

Batch API: 50% Off for Patient Workloads

Anthropic's Batch API gives you a flat 50% discount on all token costs in exchange for async processing (results within 24 hours).

ModelStandard InputBatch InputStandard OutputBatch Output
Opus 4.6$5.00$2.50$25.00$12.50
Sonnet 4.6$3.00$1.50$15.00$7.50
Haiku 4.5$1.00$0.50$5.00$2.50

Batch Opus 4.6 at $2.50/$12.50 is actually cheaper than standard GPT-4o ($2.50/$10 input, but $12.50 vs $10 output). If you can tolerate the latency, batch processing is the single biggest cost lever available.

Good batch candidates:

  • Nightly data processing jobs
  • Content classification or tagging
  • Bulk document summarization
  • Test suite evaluation

Fast Mode: 6x Premium for Priority

On the opposite end, Anthropic's fast mode charges a 6x multiplier for priority processing:

ModelStandard OutputFast Mode Output
Opus 4.6$25.00$150.00
Sonnet 4.6$15.00$90.00
Haiku 4.5$5.00$30.00

Fast mode Opus at $150/M output tokens is the most expensive AI API call you can make from any provider. Use it only when latency is genuinely critical — never as a default.

Web Search: The $0.01 Per-Call Fee

Claude's web search tool costs a flat $0.01 per search call ($10 per 1,000 searches), separate from token costs.

ProviderWeb SearchWeb Fetch
Anthropic$0.010/callFree
OpenAI$0.010/callFree
Google$0.014/callFree
Groq$0.005/callFree

This fee is small individually but compounds in agentic workloads where every turn might trigger a search. An agent that searches 50 times per session adds $0.50 in tool fees alone.

Real-World Cost Scenarios

Scenario 1: Customer Support Chatbot (1M messages/month)

Average message: 200 input tokens, 300 output tokens, 2,000-token system prompt (cached).

With Haiku 4.5 + caching (85% cache hit rate):

  • System prompt writes: 150K × 2,000 tokens × $1.25/M = $0.375
  • System prompt reads: 850K × 2,000 tokens × $0.10/M = $0.170
  • User input: 1M × 200 tokens × $1.00/M = $0.200
  • Output: 1M × 300 tokens × $5.00/M = $1.500
  • Monthly total: ~$2.25

Same workload with GPT-4.1-nano:

  • Input: 1M × 2,200 tokens × $0.10/M = $0.220
  • Output: 1M × 300 tokens × $0.40/M = $0.120
  • Monthly total: ~$0.34

Haiku costs 6.6x more than GPT-4.1-nano for this workload. If Claude's response quality matters for your support experience, the premium might be worth it. If not, this is money you're leaving on the table.

Scenario 2: Code Review Agent (500 reviews/day)

Average: 8,000 input tokens (code context), 2,000 output tokens, extended thinking enabled (est. 3,000 thinking tokens).

With Sonnet 4.6:

  • Input: 15K × 8,000 tokens × $3.00/M = $360
  • Output: 15K × 2,000 tokens × $15.00/M = $450
  • Thinking: 15K × 3,000 tokens × $15.00/M = $675
  • Monthly total: ~$1,485

Thinking tokens are the dominant cost — 45% of the total bill. Disabling extended thinking where full reasoning isn't needed (e.g., style-only reviews) could cut costs by nearly half.

Scenario 3: Batch Document Processing (100K documents/month)

Average: 5,000 input tokens, 1,000 output tokens per document.

Sonnet 4.6 Standard:

  • Input: 100K × 5,000 × $3.00/M = $1,500
  • Output: 100K × 1,000 × $15.00/M = $1,500
  • Monthly total: $3,000

Sonnet 4.6 Batch API:

  • Input: 100K × 5,000 × $1.50/M = $750
  • Output: 100K × 1,000 × $7.50/M = $750
  • Monthly total: $1,500 (50% savings)

For bulk workloads, batch is non-negotiable. $1,500/month in savings from a single API parameter change.

The Decision Framework

Here's how to choose the right Claude model:

Use Opus 4.6 ($5/$25) when:

  • Task requires multi-step reasoning or complex analysis
  • Accuracy matters more than cost (legal, medical, financial)
  • You're doing < 10K calls/day and quality is the differentiator

Use Sonnet 4.6 ($3/$15) when:

  • You need strong performance at production scale
  • Tasks are moderately complex (code generation, content creation)
  • Budget allows ~$3-5K/month on AI

Use Haiku 4.5 ($1/$5) when:

  • Volume is high (>100K calls/day)
  • Tasks are straightforward (classification, extraction, routing)
  • Latency matters more than nuance

Switch away from Claude when:

  • You're doing simple tasks at high volume — GPT-4.1-nano at $0.10/$0.40 is 10x cheaper than Haiku
  • Cache hit rates are below 50% — Google's caching has no write surcharge
  • Budget is the primary constraint — Anthropic is the premium option at every tier

How to Track What You're Actually Spending

The biggest Claude pricing mistake is assuming your bill matches the base rate table. Between cache write surcharges, thinking tokens, fast mode, and tool fees, the real cost per request can be 2-6x higher than the headline price.

You need visibility into:

  • Cache hit rate — are your writes paying for themselves?
  • Thinking token ratio — how many invisible tokens per visible output?
  • Cost per feature — which parts of your app are driving the Claude bill?

AISpendGuard tracks all of these automatically. Our SDK breaks down every Claude API call into regular input, cached reads, cached writes, thinking tokens, and tool fees — so you see exactly where your money goes. No prompt storage, no latency overhead. Just the cost data you need.

Start monitoring your Claude spend for free →


Prices verified against Anthropic's API pricing page and our internal model-pricing database as of April 10, 2026. Anthropic updates pricing periodically — check aispendguard.com/model-prices for real-time rates.


Related Articles


Want to track your AI spend automatically?

AISpendGuard detects waste patterns, breaks down costs by feature, and recommends specific changes with $/mo savings estimates.