$60 per million tokens two years ago. $2 today. Something doesn't add up.
If you've been building with AI APIs since 2024, you've watched prices fall off a cliff. GPT-4 output tokens that cost $60/M in early 2024 now cost $8/M with GPT-4.1 — and you can get comparable quality from GPT-4.1 Nano at $0.40/M.
Feels great, right? Ship more features, burn less budget, everybody wins.
Except there's a problem. You're not paying what these tokens actually cost to produce. And the companies selling them to you are hemorrhaging cash to keep it that way.
A Hacker News thread this month titled "AI API Prices Are 90% Subsidized" lit up the developer community. The argument: current API pricing is a land grab, not a business model. And when the subsidies end, your AI budget could triple overnight.
The numbers behind the subsidy
Let's look at what the major providers are actually spending:
OpenAI is projected to burn through $14 billion in 2026. Their head of ChatGPT publicly called the current pricing model "accidental" and said it will "significantly evolve." A leaked "$100/month Pro Lite" consumer tier suggests price increases are already in the pipeline.
Anthropic raised $8B+ in funding. Claude Opus 4.6 at $5/$25 per million tokens (input/output) is 66.7% cheaper than Opus 4/4.1 was at $15/$75. They're buying market share, not margins.
Google can subsidize Gemini longer than anyone — but even they just overhauled their billing system (March 23) with new prepay/postpay plans and spend caps starting April 1. That's not the behavior of a company planning to keep things cheap forever.
The pattern: Every major AI lab is pricing below cost to win developers. But VC patience has limits, and three counterforces are already pushing costs back up.
Three forces that could reverse the price decline
1. HBM memory costs are spiking
High Bandwidth Memory (HBM) — the specialized chips that make GPU inference possible — is in a supply crunch. SK Hynix and Samsung can't manufacture it fast enough. When hardware costs rise, API prices eventually follow.
2. Energy costs and new taxes
Training and serving large models consumes enormous power. Several jurisdictions are introducing AI-specific energy taxes or carbon surcharges. Data center electricity costs have risen 15-20% year-over-year in key markets.
3. Compliance mandates
The EU AI Act and similar regulations are adding compliance overhead. Audit logs, model cards, safety testing — none of this is free. These costs get baked into API pricing sooner or later.
The "model avalanche" makes it worse
Between March 10-16, six companies launched 12 distinct models in a single week. March 2026 alone saw 114 out of 483 tracked models change prices. Enterprise teams that budgeted for 3-4 major releases per year are now facing monthly pricing changes.
Here's what the current landscape actually looks like:
| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) | Context |
|---|---|---|---|---|
| OpenAI | GPT-4.1 | $2.00 | $8.00 | 1M |
| OpenAI | GPT-4.1 Mini | $0.40 | $1.60 | 1M |
| OpenAI | GPT-4.1 Nano | $0.10 | $0.40 | 1M |
| Anthropic | Claude Opus 4.6 | $5.00 | $25.00 | 200K |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Anthropic | Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | |
| Gemini Flash-Lite | $0.10 | $0.40 | 1M | |
| Mistral | Large | $2.00 | $6.00 | 128K |
| Mistral | Small | $0.10 | $0.30 | 128K |
These prices look amazing. But if even 30% of the subsidy disappears, your $500/month AI bill becomes $650 — with zero change in your usage.
What the community is saying
The Hacker News post "What the 2026 AI Price Hikes Taught Me About Lean Engineering" captured the mood perfectly:
"The performance difference was negligible for 80% of our tasks" when comparing cheap models to premium ones.
Over on r/LocalLLaMA, a post with 2,400 upvotes summed up the self-hosting argument: "I was paying OpenAI $80/month for my whole team. Now I run Qwen 2.5-72B on an M4 Mac Studio. Accuracy is identical. Cost is zero."
The smartest teams aren't waiting for the subsidy cliff. They're preparing now.
Five things to do before prices go up
1. Know exactly what you're spending — and where
You can't optimize what you don't measure. Most teams have AI spend scattered across 2-3 providers with no unified view of which features, routes, or customers are driving costs.
AISpendGuard gives you a single dashboard across all providers — with attribution down to the feature level. No prompts stored, just tags and costs. You'll know exactly where your money goes before prices change.
2. Tier your models aggressively
The HN community's finding — that 80% of tasks don't need premium models — matches what we see in real usage data. A typical pattern:
| Task Type | Expensive Choice | Cheaper Alternative | Savings |
|---|---|---|---|
| Summarization | GPT-4.1 ($8/M out) | GPT-4.1 Nano ($0.40/M out) | 95% |
| Classification | Claude Sonnet ($15/M out) | Haiku 4.5 ($5/M out) | 67% |
| Code review | Claude Opus ($25/M out) | Claude Sonnet ($15/M out) | 40% |
| Simple extraction | Any frontier model | Gemini Flash-Lite ($0.40/M out) | 90%+ |
AISpendGuard's waste detection rules flag exactly these patterns — showing you which API calls use an expensive model when a cheaper one would work.
3. Use caching and batch APIs
Two features that most teams underuse:
- Prompt caching: Anthropic and Google both offer 90% off cached input tokens. OpenAI offers 50-75% off depending on the model. If you're sending repeated system prompts, you're leaving money on the table.
- Batch API: OpenAI's batch processing gives a flat 50% discount on all models. If your workload isn't latency-sensitive, batch it.
4. Set budget alerts now — not after the price hike
When OpenAI or Anthropic adjusts pricing, the change shows up on your next bill. If you don't have alerts in place, you'll find out the hard way.
Set up spend monitoring with hard limits on your free tier and soft alerts on paid tiers. AISpendGuard's budget controls let you cap spending per workspace, per feature, or per customer — so a price hike doesn't blow your budget.
5. Track pricing changes automatically
With 114 model price changes in March alone, manually checking provider pricing pages isn't realistic. You need automated monitoring.
AISpendGuard tracks model price changes across all major providers, so you'll know when the models you depend on get more expensive — before it hits your bill.
The bottom line
The cheap AI tokens you're building on today are a temporary market condition, not a natural law. OpenAI's $14 billion burn rate, rising hardware costs, and new regulatory overhead all point in the same direction: prices will go up.
The question isn't if — it's when and by how much.
The teams that will handle it best are the ones who already know their cost baseline, have model tiering in place, and get alerts when pricing changes. The ones who'll get hurt are the ones who assumed $2/M tokens was the new normal and built accordingly.
Don't wait for the subsidy cliff to find out what your AI actually costs. Start tracking now — sign up for free and get visibility before prices move.
AISpendGuard monitors AI spend across OpenAI, Anthropic, Google, Mistral, and more — without storing your prompts. Free up to 50,000 events/month.