A developer recently told us their Claude bill tripled after they turned on extended caching. They expected it to save money. It did — on reads. But the 2x write surcharge on the 1-hour TTL ate those savings alive.
This is the problem with Anthropic's pricing: the headline numbers are straightforward, but the actual cost depends on a maze of multipliers that most developers never see until they check the invoice.
This guide breaks down every Claude model's real cost in 2026 — base rates, cache pricing, thinking tokens, batch discounts, fast mode surcharges, and web tool fees. No marketing fluff. Just the numbers you need to budget correctly.
The Claude Model Lineup (April 2026)
Anthropic currently offers three tiers across two generations:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Best For |
|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 200K | Complex reasoning, research, code architecture |
| Claude Opus 4.5 | $5.00 | $25.00 | 200K | Same capabilities, previous generation |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | Balanced performance/cost, production workloads |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K | Previous gen, still capable |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | High-volume, latency-sensitive tasks |
Legacy models (still available)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Status |
|---|---|---|---|---|
| Claude Opus 4.1 / 4 | $15.00 | $75.00 | 200K | Active |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K | Active |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200K | Active |
| Claude 3 Opus | $15.00 | $75.00 | 200K | Active |
| Claude 3 Haiku | $0.25 | $1.25 | 200K | Retiring April 19 |
Key insight: Claude Opus 4.6 is 3x cheaper than the original Claude 3 Opus ($5 vs $15 input, $25 vs $75 output) while being significantly more capable. If you're still on Claude 3 Opus, you're paying 3x more for less.
Haiku 3 Retirement: What You Need to Know (April 19, 2026)
Anthropic is retiring Claude 3 Haiku on April 19 — nine days from now. If your production code references claude-3-haiku-20240307, it will stop working.
The migration path is Claude Haiku 4.5, which means a 4x price increase:
| Claude 3 Haiku (retiring) | Claude Haiku 4.5 (replacement) | |
|---|---|---|
| Input | $0.25/M | $1.00/M |
| Output | $1.25/M | $5.00/M |
| Max output | 4K tokens | 64K tokens |
| Extended thinking | No | Yes |
The price jump is real, but Haiku 4.5 is a significantly more capable model — longer outputs, better instruction following, and thinking support. If your Haiku 3 workload is simple classification or routing, consider whether GPT-4.1-nano ($0.10/$0.40) or Gemini 2.0 Flash ($0.10/$0.40) can handle it at a fraction of the cost instead.
Action item: Search your codebase for claude-3-haiku and claude-3-haiku-20240307 before April 19. Track the cost impact of your migration with AISpendGuard.
The Cache Pricing That Changes Everything
Anthropic's cache pricing is the most aggressive in the industry — and the most complex. Here's how it actually works:
Cache Read: 90% Discount
When Claude hits a cache on your prompt prefix, you pay just 10% of the normal input price.
| Model | Normal Input | Cached Read |
|---|---|---|
| Opus 4.6 | $5.00/M | $0.50/M |
| Sonnet 4.6 | $3.00/M | $0.30/M |
| Haiku 4.5 | $1.00/M | $0.10/M |
That's a 0.1x multiplier — meaning cached Haiku reads cost just $0.10 per million tokens. At that price, Haiku becomes cheaper than most open-source model hosting.
Cache Write: The Surcharge Nobody Expects
Here's where it gets tricky. Anthropic charges a surcharge on cache writes — the first time your prompt prefix is stored:
| Cache TTL | Write Multiplier | Opus 4.6 Write Cost | Sonnet 4.6 Write Cost |
|---|---|---|---|
| 5 minutes (default) | 1.25x | $6.25/M | $3.75/M |
| 1 hour (extended) | 2.0x | $10.00/M | $6.00/M |
The trap: If your cache hit rate is low (below ~60%), the write surcharge can actually make caching more expensive than not caching at all. The 1-hour TTL is especially dangerous — at 2x the normal input cost, you need consistent repeat calls within that hour to break even.
When Caching Pays Off (Real Math)
Let's say you're running Sonnet 4.6 with a 4,000-token system prompt, 100 calls per hour:
Without caching:
- 100 calls × 4,000 tokens × $3.00/M = $1.20/hour
With 5-minute TTL caching (assuming 80% hit rate):
- 20 cache writes × 4,000 tokens × $3.75/M = $0.30
- 80 cache reads × 4,000 tokens × $0.30/M = $0.096
- Total: $0.396/hour (67% savings)
With 1-hour TTL caching (assuming 95% hit rate):
- 5 cache writes × 4,000 tokens × $6.00/M = $0.12
- 95 cache reads × 4,000 tokens × $0.30/M = $0.114
- Total: $0.234/hour (81% savings)
The takeaway: caching is powerful, but only if your hit rate justifies the write cost. Track both metrics, not just "caching is enabled."
How Claude Compares to OpenAI and Google
Cache Read Discounts: Provider Comparison
| Provider | Cache Read Multiplier | Notes |
|---|---|---|
| Anthropic | 0.1x (90% off) | Best discount, but has write surcharges |
| 0.1x (90% off) | Matches Anthropic, no write surcharge | |
| OpenAI (GPT-4.1, o3, o4) | 0.25x (75% off) | Automatic, no write surcharge |
| OpenAI (other models) | 0.5x (50% off) | Automatic, no write surcharge |
Anthropic and Google tie on read discounts. But Anthropic's write surcharge means Google's caching is actually cheaper in practice for low-hit-rate workloads.
Tier-for-Tier Price Comparison
| Tier | Anthropic | OpenAI | |
|---|---|---|---|
| Flagship | Opus 4.6: $5/$25 | GPT-4.1: $2/$8 | Gemini 2.5 Pro: $1.25/$10 |
| Mid-tier | Sonnet 4.6: $3/$15 | GPT-4o: $2.50/$10 | Gemini 2.5 Flash: $0.30/$2.50 |
| Budget | Haiku 4.5: $1/$5 | GPT-4.1-nano: $0.10/$0.40 | Gemini 2.0 Flash: $0.10/$0.40 |
The honest assessment: Anthropic is the most expensive provider at every tier. Opus 4.6 costs 2.5x more than GPT-4.1 and 4x more than Gemini 2.5 Pro for input. At the budget tier, Haiku 4.5 at $1/$5 is 10x more expensive than GPT-4.1-nano or Gemini 2.0 Flash at $0.10/$0.40.
The question isn't whether Claude is more expensive — it is. The question is whether the quality difference justifies the premium for your specific use case.
Thinking Tokens: The Output Cost You Don't See
Claude's reasoning models (extended thinking) generate thinking tokens that are billed at the output token rate — and they don't appear in the final response.
For Opus 4.6 with extended thinking enabled, a response that generates 500 visible output tokens might consume 2,000 thinking tokens behind the scenes. Your actual output cost:
- Visible output: 500 tokens × $25/M = $0.0125
- Thinking tokens: 2,000 tokens × $25/M = $0.05
- Real cost: $0.0625 (5x what you expected)
At $25 per million output tokens, those invisible thinking tokens add up fast. If you don't need extended thinking for a task, turn it off. There's no reason to pay for reasoning on a classification or extraction task.
Batch API: 50% Off for Patient Workloads
Anthropic's Batch API gives you a flat 50% discount on all token costs in exchange for async processing (results within 24 hours).
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|
| Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 |
| Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 |
| Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 |
Batch Opus 4.6 at $2.50/$12.50 is actually cheaper than standard GPT-4o ($2.50/$10 input, but $12.50 vs $10 output). If you can tolerate the latency, batch processing is the single biggest cost lever available.
Good batch candidates:
- Nightly data processing jobs
- Content classification or tagging
- Bulk document summarization
- Test suite evaluation
Fast Mode: 6x Premium for Priority
On the opposite end, Anthropic's fast mode charges a 6x multiplier for priority processing:
| Model | Standard Output | Fast Mode Output |
|---|---|---|
| Opus 4.6 | $25.00 | $150.00 |
| Sonnet 4.6 | $15.00 | $90.00 |
| Haiku 4.5 | $5.00 | $30.00 |
Fast mode Opus at $150/M output tokens is the most expensive AI API call you can make from any provider. Use it only when latency is genuinely critical — never as a default.
Web Search: The $0.01 Per-Call Fee
Claude's web search tool costs a flat $0.01 per search call ($10 per 1,000 searches), separate from token costs.
| Provider | Web Search | Web Fetch |
|---|---|---|
| Anthropic | $0.010/call | Free |
| OpenAI | $0.010/call | Free |
| $0.014/call | Free | |
| Groq | $0.005/call | Free |
This fee is small individually but compounds in agentic workloads where every turn might trigger a search. An agent that searches 50 times per session adds $0.50 in tool fees alone.
Real-World Cost Scenarios
Scenario 1: Customer Support Chatbot (1M messages/month)
Average message: 200 input tokens, 300 output tokens, 2,000-token system prompt (cached).
With Haiku 4.5 + caching (85% cache hit rate):
- System prompt writes: 150K × 2,000 tokens × $1.25/M = $0.375
- System prompt reads: 850K × 2,000 tokens × $0.10/M = $0.170
- User input: 1M × 200 tokens × $1.00/M = $0.200
- Output: 1M × 300 tokens × $5.00/M = $1.500
- Monthly total: ~$2.25
Same workload with GPT-4.1-nano:
- Input: 1M × 2,200 tokens × $0.10/M = $0.220
- Output: 1M × 300 tokens × $0.40/M = $0.120
- Monthly total: ~$0.34
Haiku costs 6.6x more than GPT-4.1-nano for this workload. If Claude's response quality matters for your support experience, the premium might be worth it. If not, this is money you're leaving on the table.
Scenario 2: Code Review Agent (500 reviews/day)
Average: 8,000 input tokens (code context), 2,000 output tokens, extended thinking enabled (est. 3,000 thinking tokens).
With Sonnet 4.6:
- Input: 15K × 8,000 tokens × $3.00/M = $360
- Output: 15K × 2,000 tokens × $15.00/M = $450
- Thinking: 15K × 3,000 tokens × $15.00/M = $675
- Monthly total: ~$1,485
Thinking tokens are the dominant cost — 45% of the total bill. Disabling extended thinking where full reasoning isn't needed (e.g., style-only reviews) could cut costs by nearly half.
Scenario 3: Batch Document Processing (100K documents/month)
Average: 5,000 input tokens, 1,000 output tokens per document.
Sonnet 4.6 Standard:
- Input: 100K × 5,000 × $3.00/M = $1,500
- Output: 100K × 1,000 × $15.00/M = $1,500
- Monthly total: $3,000
Sonnet 4.6 Batch API:
- Input: 100K × 5,000 × $1.50/M = $750
- Output: 100K × 1,000 × $7.50/M = $750
- Monthly total: $1,500 (50% savings)
For bulk workloads, batch is non-negotiable. $1,500/month in savings from a single API parameter change.
The Decision Framework
Here's how to choose the right Claude model:
Use Opus 4.6 ($5/$25) when:
- Task requires multi-step reasoning or complex analysis
- Accuracy matters more than cost (legal, medical, financial)
- You're doing < 10K calls/day and quality is the differentiator
Use Sonnet 4.6 ($3/$15) when:
- You need strong performance at production scale
- Tasks are moderately complex (code generation, content creation)
- Budget allows ~$3-5K/month on AI
Use Haiku 4.5 ($1/$5) when:
- Volume is high (>100K calls/day)
- Tasks are straightforward (classification, extraction, routing)
- Latency matters more than nuance
Switch away from Claude when:
- You're doing simple tasks at high volume — GPT-4.1-nano at $0.10/$0.40 is 10x cheaper than Haiku
- Cache hit rates are below 50% — Google's caching has no write surcharge
- Budget is the primary constraint — Anthropic is the premium option at every tier
How to Track What You're Actually Spending
The biggest Claude pricing mistake is assuming your bill matches the base rate table. Between cache write surcharges, thinking tokens, fast mode, and tool fees, the real cost per request can be 2-6x higher than the headline price.
You need visibility into:
- Cache hit rate — are your writes paying for themselves?
- Thinking token ratio — how many invisible tokens per visible output?
- Cost per feature — which parts of your app are driving the Claude bill?
AISpendGuard tracks all of these automatically. Our SDK breaks down every Claude API call into regular input, cached reads, cached writes, thinking tokens, and tool fees — so you see exactly where your money goes. No prompt storage, no latency overhead. Just the cost data you need.
Start monitoring your Claude spend for free →
Prices verified against Anthropic's API pricing page and our internal model-pricing database as of April 10, 2026. Anthropic updates pricing periodically — check aispendguard.com/model-prices for real-time rates.