pricingApr 10, 202612 min read

Anthropic Claude Pricing in 2026: Every Model, Every Discount, Every Hidden Cost

From Haiku at $1/M to Opus at $75/M — the complete breakdown of what Claude actually costs, including the cache discounts and surcharges Anthropic doesn't make obvious.

A developer recently told us their Claude bill tripled after they turned on extended caching. They expected it to save money. It did — on reads. But the 2x write surcharge on the 1-hour TTL ate those savings alive.

This is the problem with Anthropic's pricing: the headline numbers are straightforward, but the actual cost depends on a maze of multipliers that most developers never see until they check the invoice.

This guide breaks down every Claude model's real cost in 2026 — base rates, cache pricing, thinking tokens, batch discounts, fast mode surcharges, and web tool fees. No marketing fluff. Just the numbers you need to budget correctly.

The Claude Model Lineup (April 2026)

Anthropic currently offers three tiers across two generations:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Best For
Claude Opus 4.6	$5.00	$25.00	200K	Complex reasoning, research, code architecture
Claude Opus 4.5	$5.00	$25.00	200K	Same capabilities, previous generation
Claude Sonnet 4.6	$3.00	$15.00	200K	Balanced performance/cost, production workloads
Claude Sonnet 4.5	$3.00	$15.00	200K	Previous gen, still capable
Claude Haiku 4.5	$1.00	$5.00	200K	High-volume, latency-sensitive tasks

Legacy models (still available)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Status
Claude Opus 4.1 / 4	$15.00	$75.00	200K	Active
Claude 3.5 Sonnet	$3.00	$15.00	200K	Active
Claude 3.5 Haiku	$0.80	$4.00	200K	Active
Claude 3 Opus	$15.00	$75.00	200K	Active
Claude 3 Haiku	$0.25	$1.25	200K	Retiring April 19

Key insight: Claude Opus 4.6 is 3x cheaper than the original Claude 3 Opus ($5 vs $15 input, $25 vs $75 output) while being significantly more capable. If you're still on Claude 3 Opus, you're paying 3x more for less.

Haiku 3 Retirement: What You Need to Know (April 19, 2026)

Anthropic is retiring Claude 3 Haiku on April 19 — nine days from now. If your production code references claude-3-haiku-20240307, it will stop working.

The migration path is Claude Haiku 4.5, which means a 4x price increase:

	Claude 3 Haiku (retiring)	Claude Haiku 4.5 (replacement)
Input	$0.25/M	$1.00/M
Output	$1.25/M	$5.00/M
Max output	4K tokens	64K tokens
Extended thinking	No	Yes

The price jump is real, but Haiku 4.5 is a significantly more capable model — longer outputs, better instruction following, and thinking support. If your Haiku 3 workload is simple classification or routing, consider whether GPT-4.1-nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40) can handle it at a fraction of the cost instead.

Action item: Search your codebase for claude-3-haiku and claude-3-haiku-20240307 before April 19. Track the cost impact of your migration with AISpendGuard.

The Cache Pricing That Changes Everything

Anthropic's cache pricing is the most aggressive in the industry — and the most complex. Here's how it actually works:

Cache Read: 90% Discount

When Claude hits a cache on your prompt prefix, you pay just 10% of the normal input price.

Model	Normal Input	Cached Read
Opus 4.6	$5.00/M	$0.50/M
Sonnet 4.6	$3.00/M	$0.30/M
Haiku 4.5	$1.00/M	$0.10/M

That's a 0.1x multiplier — meaning cached Haiku reads cost just $0.10 per million tokens. At that price, Haiku becomes cheaper than most open-source model hosting.

Cache Write: The Surcharge Nobody Expects

Here's where it gets tricky. Anthropic charges a surcharge on cache writes — the first time your prompt prefix is stored:

Cache TTL	Write Multiplier	Opus 4.6 Write Cost	Sonnet 4.6 Write Cost
5 minutes (default)	1.25x	$6.25/M	$3.75/M
1 hour (extended)	2.0x	$10.00/M	$6.00/M

The trap: If your cache hit rate is low (below ~60%), the write surcharge can actually make caching more expensive than not caching at all. The 1-hour TTL is especially dangerous — at 2x the normal input cost, you need consistent repeat calls within that hour to break even.

When Caching Pays Off (Real Math)

Let's say you're running Sonnet 4.6 with a 4,000-token system prompt, 100 calls per hour:

Without caching:

100 calls × 4,000 tokens × $3.00/M = $1.20/hour

With 5-minute TTL caching (assuming 80% hit rate):

20 cache writes × 4,000 tokens × $3.75/M = $0.30
80 cache reads × 4,000 tokens × $0.30/M = $0.096
Total: $0.396/hour (67% savings)

With 1-hour TTL caching (assuming 95% hit rate):

5 cache writes × 4,000 tokens × $6.00/M = $0.12
95 cache reads × 4,000 tokens × $0.30/M = $0.114
Total: $0.234/hour (81% savings)

The takeaway: caching is powerful, but only if your hit rate justifies the write cost. Track both metrics, not just "caching is enabled."

How Claude Compares to OpenAI and Google

Cache Read Discounts: Provider Comparison

Provider	Cache Read Multiplier	Notes
Anthropic	0.1x (90% off)	Best discount, but has write surcharges
Google	0.1x (90% off)	Matches Anthropic, no write surcharge
OpenAI (GPT-4.1, o3, o4)	0.25x (75% off)	Automatic, no write surcharge
OpenAI (other models)	0.5x (50% off)	Automatic, no write surcharge

Anthropic and Google tie on read discounts. But Anthropic's write surcharge means Google's caching is actually cheaper in practice for low-hit-rate workloads.

Tier-for-Tier Price Comparison

Tier	Anthropic	OpenAI	Google
Flagship	Opus 4.6: $5/$25	GPT-4.1: $2/$8	Gemini 2.5 Pro: $1.25/$10
Mid-tier	Sonnet 4.6: $3/$15	GPT-4o: $2.50/$10	Gemini 2.5 Flash: $0.30/$2.50
Budget	Haiku 4.5: $1/$5	GPT-4.1-nano: $0.10/$0.40	Gemini 2.0 Flash: $0.10/$0.40

The honest assessment: Anthropic is the most expensive provider at every tier. Opus 4.6 costs 2.5x more than GPT-4.1 and 4x more than Gemini 2.5 Pro for input. At the budget tier, Haiku 4.5 at $1/$5 is 10x more expensive than GPT-4.1-nano or Gemini 2.0 Flash at $0.10/$0.40.

The question isn't whether Claude is more expensive — it is. The question is whether the quality difference justifies the premium for your specific use case.

Thinking Tokens: The Output Cost You Don't See

Claude's reasoning models (extended thinking) generate thinking tokens that are billed at the output token rate — and they don't appear in the final response.

For Opus 4.6 with extended thinking enabled, a response that generates 500 visible output tokens might consume 2,000 thinking tokens behind the scenes. Your actual output cost:

Visible output: 500 tokens × $25/M = $0.0125
Thinking tokens: 2,000 tokens × $25/M = $0.05
Real cost: $0.0625 (5x what you expected)

At $25 per million output tokens, those invisible thinking tokens add up fast. If you don't need extended thinking for a task, turn it off. There's no reason to pay for reasoning on a classification or extraction task.

Batch API: 50% Off for Patient Workloads

Anthropic's Batch API gives you a flat 50% discount on all token costs in exchange for async processing (results within 24 hours).

Model	Standard Input	Batch Input	Standard Output	Batch Output
Opus 4.6	$5.00	$2.50	$25.00	$12.50
Sonnet 4.6	$3.00	$1.50	$15.00	$7.50
Haiku 4.5	$1.00	$0.50	$5.00	$2.50

Batch Opus 4.6 at $2.50/$12.50 is actually cheaper than standard GPT-4o ($2.50/$10 input, but $12.50 vs $10 output). If you can tolerate the latency, batch processing is the single biggest cost lever available.

Good batch candidates:

Nightly data processing jobs
Content classification or tagging
Bulk document summarization
Test suite evaluation

Fast Mode: 6x Premium for Priority

On the opposite end, Anthropic's fast mode charges a 6x multiplier for priority processing:

Model	Standard Output	Fast Mode Output
Opus 4.6	$25.00	$150.00
Sonnet 4.6	$15.00	$90.00
Haiku 4.5	$5.00	$30.00

Fast mode Opus at $150/M output tokens is the most expensive AI API call you can make from any provider. Use it only when latency is genuinely critical — never as a default.

Web Search: The $0.01 Per-Call Fee

Claude's web search tool costs a flat $0.01 per search call ($10 per 1,000 searches), separate from token costs.

Provider	Web Search	Web Fetch
Anthropic	$0.010/call	Free
OpenAI	$0.010/call	Free
Google	$0.014/call	Free
Groq	$0.005/call	Free

This fee is small individually but compounds in agentic workloads where every turn might trigger a search. An agent that searches 50 times per session adds $0.50 in tool fees alone.

Real-World Cost Scenarios

Scenario 1: Customer Support Chatbot (1M messages/month)

Average message: 200 input tokens, 300 output tokens, 2,000-token system prompt (cached).

With Haiku 4.5 + caching (85% cache hit rate):

System prompt writes: 150K × 2,000 tokens × $1.25/M = $0.375
System prompt reads: 850K × 2,000 tokens × $0.10/M = $0.170
User input: 1M × 200 tokens × $1.00/M = $0.200
Output: 1M × 300 tokens × $5.00/M = $1.500
Monthly total: ~$2.25

Same workload with GPT-4.1-nano:

Input: 1M × 2,200 tokens × $0.10/M = $0.220
Output: 1M × 300 tokens × $0.40/M = $0.120
Monthly total: ~$0.34

Haiku costs 6.6x more than GPT-4.1-nano for this workload. If Claude's response quality matters for your support experience, the premium might be worth it. If not, this is money you're leaving on the table.

Scenario 2: Code Review Agent (500 reviews/day)

Average: 8,000 input tokens (code context), 2,000 output tokens, extended thinking enabled (est. 3,000 thinking tokens).

With Sonnet 4.6:

Input: 15K × 8,000 tokens × $3.00/M = $360
Output: 15K × 2,000 tokens × $15.00/M = $450
Thinking: 15K × 3,000 tokens × $15.00/M = $675
Monthly total: ~$1,485

Thinking tokens are the dominant cost — 45% of the total bill. Disabling extended thinking where full reasoning isn't needed (e.g., style-only reviews) could cut costs by nearly half.

Scenario 3: Batch Document Processing (100K documents/month)

Average: 5,000 input tokens, 1,000 output tokens per document.

Sonnet 4.6 Standard:

Input: 100K × 5,000 × $3.00/M = $1,500
Output: 100K × 1,000 × $15.00/M = $1,500
Monthly total: $3,000

Sonnet 4.6 Batch API:

Input: 100K × 5,000 × $1.50/M = $750
Output: 100K × 1,000 × $7.50/M = $750
Monthly total: $1,500 (50% savings)

For bulk workloads, batch is non-negotiable. $1,500/month in savings from a single API parameter change.

The Decision Framework

Here's how to choose the right Claude model:

Use Opus 4.6 ($5/$25) when:

Task requires multi-step reasoning or complex analysis
Accuracy matters more than cost (legal, medical, financial)
You're doing < 10K calls/day and quality is the differentiator

Use Sonnet 4.6 ($3/$15) when:

You need strong performance at production scale
Tasks are moderately complex (code generation, content creation)
Budget allows ~$3-5K/month on AI

Use Haiku 4.5 ($1/$5) when:

Volume is high (>100K calls/day)
Tasks are straightforward (classification, extraction, routing)
Latency matters more than nuance

Switch away from Claude when:

You're doing simple tasks at high volume — GPT-4.1-nano at $0.10/$0.40 is 10x cheaper than Haiku
Cache hit rates are below 50% — Google's caching has no write surcharge
Budget is the primary constraint — Anthropic is the premium option at every tier

How to Track What You're Actually Spending

The biggest Claude pricing mistake is assuming your bill matches the base rate table. Between cache write surcharges, thinking tokens, fast mode, and tool fees, the real cost per request can be 2-6x higher than the headline price.

You need visibility into:

Cache hit rate — are your writes paying for themselves?
Thinking token ratio — how many invisible tokens per visible output?
Cost per feature — which parts of your app are driving the Claude bill?

AISpendGuard tracks all of these automatically. Our SDK breaks down every Claude API call into regular input, cached reads, cached writes, thinking tokens, and tool fees — so you see exactly where your money goes. No prompt storage, no latency overhead. Just the cost data you need.

Start monitoring your Claude spend for free →

Prices verified against Anthropic's API pricing page and our internal model-pricing database as of April 10, 2026. Anthropic updates pricing periodically — check aispendguard.com/model-prices for real-time rates.