newsMar 30, 20267 min read

AI API Prices Are 90% Subsidized — What Happens When the Bill Comes Due?

OpenAI is burning $14B this year, and the cheap tokens you depend on might not last. Here's how to prepare.

$60 per million tokens two years ago. $2 today. Something doesn't add up.

If you've been building with AI APIs since 2024, you've watched prices fall off a cliff. GPT-4 output tokens that cost $60/M in early 2024 now cost $8/M with GPT-4.1 — and you can get comparable quality from GPT-4.1 Nano at $0.40/M.

Feels great, right? Ship more features, burn less budget, everybody wins.

Except there's a problem. You're not paying what these tokens actually cost to produce. And the companies selling them to you are hemorrhaging cash to keep it that way.

A Hacker News thread this month titled "AI API Prices Are 90% Subsidized" lit up the developer community. The argument: current API pricing is a land grab, not a business model. And when the subsidies end, your AI budget could triple overnight.

The numbers behind the subsidy

Let's look at what the major providers are actually spending:

OpenAI is projected to burn through $14 billion in 2026. Their head of ChatGPT publicly called the current pricing model "accidental" and said it will "significantly evolve." A leaked "$100/month Pro Lite" consumer tier suggests price increases are already in the pipeline.

Anthropic raised $8B+ in funding. Claude Opus 4.6 at $5/$25 per million tokens (input/output) is 66.7% cheaper than Opus 4/4.1 was at $15/$75. They're buying market share, not margins.

Google can subsidize Gemini longer than anyone — but even they just overhauled their billing system (March 23) with new prepay/postpay plans and spend caps starting April 1. That's not the behavior of a company planning to keep things cheap forever.

The pattern: Every major AI lab is pricing below cost to win developers. But VC patience has limits, and three counterforces are already pushing costs back up.

Three forces that could reverse the price decline

1. HBM memory costs are spiking

High Bandwidth Memory (HBM) — the specialized chips that make GPU inference possible — is in a supply crunch. SK Hynix and Samsung can't manufacture it fast enough. When hardware costs rise, API prices eventually follow.

2. Energy costs and new taxes

Training and serving large models consumes enormous power. Several jurisdictions are introducing AI-specific energy taxes or carbon surcharges. Data center electricity costs have risen 15-20% year-over-year in key markets.

3. Compliance mandates

The EU AI Act and similar regulations are adding compliance overhead. Audit logs, model cards, safety testing — none of this is free. These costs get baked into API pricing sooner or later.

The "model avalanche" makes it worse

Between March 10-16, six companies launched 12 distinct models in a single week. March 2026 alone saw 114 out of 483 tracked models change prices. Enterprise teams that budgeted for 3-4 major releases per year are now facing monthly pricing changes.

Here's what the current landscape actually looks like:

Provider	Model	Input (per 1M tokens)	Output (per 1M tokens)	Context
OpenAI	GPT-4.1	$2.00	$8.00	1M
OpenAI	GPT-4.1 Mini	$0.40	$1.60	1M
OpenAI	GPT-4.1 Nano	$0.10	$0.40	1M
Anthropic	Claude Opus 4.6	$5.00	$25.00	200K
Anthropic	Claude Sonnet 4.6	$3.00	$15.00	200K
Anthropic	Claude Haiku 4.5	$1.00	$5.00	200K
Google	Gemini 2.5 Pro	$1.25	$10.00	1M
Google	Gemini 2.5 Flash	$0.30	$2.50	1M
Google	Gemini Flash-Lite	$0.10	$0.40	1M
Mistral	Large	$2.00	$6.00	128K
Mistral	Small	$0.10	$0.30	128K

These prices look amazing. But if even 30% of the subsidy disappears, your $500/month AI bill becomes $650 — with zero change in your usage.

What the community is saying

The Hacker News post "What the 2026 AI Price Hikes Taught Me About Lean Engineering" captured the mood perfectly:

"The performance difference was negligible for 80% of our tasks" when comparing cheap models to premium ones.

Over on r/LocalLLaMA, a post with 2,400 upvotes summed up the self-hosting argument: "I was paying OpenAI $80/month for my whole team. Now I run Qwen 2.5-72B on an M4 Mac Studio. Accuracy is identical. Cost is zero."

The smartest teams aren't waiting for the subsidy cliff. They're preparing now.

Five things to do before prices go up

1. Know exactly what you're spending — and where

You can't optimize what you don't measure. Most teams have AI spend scattered across 2-3 providers with no unified view of which features, routes, or customers are driving costs.

AISpendGuard gives you a single dashboard across all providers — with attribution down to the feature level. No prompts stored, just tags and costs. You'll know exactly where your money goes before prices change.

2. Tier your models aggressively

The HN community's finding — that 80% of tasks don't need premium models — matches what we see in real usage data. A typical pattern:

Task Type	Expensive Choice	Cheaper Alternative	Savings
Summarization	GPT-4.1 ($8/M out)	GPT-4.1 Nano ($0.40/M out)	95%
Classification	Claude Sonnet ($15/M out)	Haiku 4.5 ($5/M out)	67%
Code review	Claude Opus ($25/M out)	Claude Sonnet ($15/M out)	40%
Simple extraction	Any frontier model	Gemini Flash-Lite ($0.40/M out)	90%+

AISpendGuard's waste detection rules flag exactly these patterns — showing you which API calls use an expensive model when a cheaper one would work.

3. Use caching and batch APIs

Two features that most teams underuse:

Prompt caching: Anthropic and Google both offer 90% off cached input tokens. OpenAI offers 50-75% off depending on the model. If you're sending repeated system prompts, you're leaving money on the table.
Batch API: OpenAI's batch processing gives a flat 50% discount on all models. If your workload isn't latency-sensitive, batch it.

4. Set budget alerts now — not after the price hike

When OpenAI or Anthropic adjusts pricing, the change shows up on your next bill. If you don't have alerts in place, you'll find out the hard way.

Set up spend monitoring with hard limits on your free tier and soft alerts on paid tiers. AISpendGuard's budget controls let you cap spending per workspace, per feature, or per customer — so a price hike doesn't blow your budget.

5. Track pricing changes automatically

With 114 model price changes in March alone, manually checking provider pricing pages isn't realistic. You need automated monitoring.

AISpendGuard tracks model price changes across all major providers, so you'll know when the models you depend on get more expensive — before it hits your bill.

The bottom line

The cheap AI tokens you're building on today are a temporary market condition, not a natural law. OpenAI's $14 billion burn rate, rising hardware costs, and new regulatory overhead all point in the same direction: prices will go up.

The question isn't if — it's when and by how much.

The teams that will handle it best are the ones who already know their cost baseline, have model tiering in place, and get alerts when pricing changes. The ones who'll get hurt are the ones who assumed $2/M tokens was the new normal and built accordingly.

Don't wait for the subsidy cliff to find out what your AI actually costs. Start tracking now — sign up for free and get visibility before prices move.

AISpendGuard monitors AI spend across OpenAI, Anthropic, Google, Mistral, and more — without storing your prompts. Free up to 50,000 events/month.