guideApr 18, 20269 min read

Audit Your AI Spend in 30 Minutes: A Weekend Developer Checklist

Most developers have no idea what their AI features actually cost per request. This Saturday, fix that.

Here's a question most developers can't answer: how much does each AI-powered feature in your app cost per request?

Not roughly. Not "a few cents." The actual number.

A recent survey found that 68% of engineering teams don't track AI costs at the feature level. They see a monthly bill from OpenAI or Anthropic, shrug if it looks "about right," and move on. Then one month the bill doubles — and nobody knows why.

This guide is your Saturday morning fix. Grab a coffee, open a terminal, and follow this 7-step checklist. In 30 minutes, you'll know exactly where your AI money goes and have a plan to stop wasting it.

Step 1: Screenshot Every Provider Dashboard (5 minutes)

Start with the obvious. Log into every AI provider you use and capture your current month's spend:

OpenAI: platform.openai.com/usage
Anthropic: console.anthropic.com/settings/billing
Google AI: aistudio.google.com or Cloud Console billing
Any others: Groq, Mistral, Cohere, etc.

Write down three numbers per provider:

Provider	This Month	Last Month	Trend
OpenAI	$______	$______	up/down/flat
Anthropic	$______	$______	up/down/flat
Google	$______	$______	up/down/flat
Total	$______	$______

Why this matters: Most teams use 2-3 providers but only check the "main" one. The secondary providers are where costs creep up unnoticed — that experimental Gemini integration you forgot about, the Claude fallback that's handling more traffic than you thought.

Step 2: List Every Model You're Using (5 minutes)

Search your codebase for model references. Run these in your project root:

# Find OpenAI model references
grep -r "model.*gpt\|model.*o1\|model.*o3\|model.*o4" --include="*.ts" --include="*.py" --include="*.js" -l

# Find Anthropic model references
grep -r "model.*claude" --include="*.ts" --include="*.py" --include="*.js" -l

# Find Google model references
grep -r "model.*gemini" --include="*.ts" --include="*.py" --include="*.js" -l

Now build your model inventory:

Model	Where Used	Input/MTok	Output/MTok
gpt-4o	Chat feature	$2.50	$10.00
gpt-4o-mini	Classify endpoint	$0.15	$0.60
claude-sonnet-4-6	Code review	$3.00	$15.00
gemini-2.0-flash	Embeddings	$0.10	$0.40

Fill in the "Why This Model?" column honestly. If the answer is "because that's what the tutorial used" or "I don't know, it was there when I joined" — that's a cost optimization opportunity.

Step 3: Check for Legacy and Deprecated Models (3 minutes)

This is where quick wins hide. Providers deprecate models regularly, and the replacements are almost always cheaper and better.

Models to check for right now (April 2026):

If You're Using...	Switch To	Savings
gpt-4-turbo ($10/$30)	gpt-4o ($2.50/$10)	75% cheaper
gpt-3.5-turbo ($0.50/$1.50)	gpt-4.1-nano ($0.10/$0.40)	80% cheaper, smarter
claude-3-opus ($15/$75)	claude-opus-4-6 ($5/$25)	67% cheaper
claude-3-5-sonnet ($3/$15)	claude-sonnet-4-6 ($3/$15)	Same price, better quality
claude-3-haiku ($0.25/$1.25)	claude-haiku-4-5 ($1/$5)	More expensive — but claude-3-haiku sunsets this month
gemini-1.5-pro ($1.25/$5)	gemini-2.5-pro ($1.25/$10)	Same input, higher output — but longer context

Watch out: claude-3-haiku is sunsetting this month (April 2026). If you're still using it, migration isn't optional — it's urgent. The replacement (claude-haiku-4-5) costs 4x more, so consider whether those calls could use gpt-4o-mini ($0.15/$0.60) or gemini-2.0-flash ($0.10/$0.40) instead.

Quick action: For every deprecated model you find, create a ticket. Don't just update the string — test the replacement model's output quality first.

Step 4: Calculate Your Cost Per Request (7 minutes)

This is the step most developers skip — and the one that reveals the most. Grab your average token counts from your provider dashboard or logs.

The formula:

Cost per request = (input_tokens × input_price) + (output_tokens × output_price)

Where prices are per-token (divide per-million-token prices by 1,000,000).

Example calculation for a chat feature using GPT-4o:

Average input: 2,000 tokens
Average output: 500 tokens

Input cost:  2,000 × ($2.50 / 1,000,000) = $0.005
Output cost: 500 × ($10.00 / 1,000,000) = $0.005
Total per request: $0.01

At 10,000 requests/month: $100/month
At 100,000 requests/month: $1,000/month

Now do this for every AI feature in your app:

Feature	Model	Avg Input	Avg Output	Cost/Request	Monthly Volume	Monthly Cost
Chat	gpt-4o	2,000	500	$0.010	10,000	$100
Classify	gpt-4o-mini	500	10	$0.000081	50,000	$4
Summarize	claude-sonnet-4-6	5,000	800	$0.027	3,000	$81
Code review	claude-opus-4-6	10,000	2,000	$0.10	500	$50
Total						$235

Key insight: Output tokens typically cost 3-5x more than input tokens. If a feature generates long outputs but you only use a fraction of them — that's waste. A classify task returning a 500-token explanation when you need one word? That's 499 tokens of pure waste at output rates.

Step 5: Identify Your Top 3 Cost Drivers (3 minutes)

Sort the table from Step 4 by monthly cost. Your top 3 features are where optimization effort pays off.

For each top-cost feature, ask:

Is this the right model? Could a cheaper model handle it? A $0.10/request code review might work fine with Sonnet instead of Opus.
Is the output too long? Are you getting verbose responses and throwing away most of them? Add max_tokens limits or use structured output (JSON mode).
Is the volume justified? Are you making AI calls that could be cached, batched, or skipped entirely?
Is prompt caching available? If you send the same system prompt repeatedly, providers offer 50-90% discounts on cached input tokens.

The 80/20 rule applies: Optimizing your single most expensive feature will likely save more than tweaking everything else combined.

Step 6: Check for Easy Model Downgrades (5 minutes)

The most common waste pattern: using a premium model for a simple task.

Run through this decision tree for each feature:

Is the task classification, routing, or extraction?
  → Use gpt-4o-mini ($0.15/$0.60) or gemini-2.0-flash ($0.10/$0.40)

Is the task summarization or rewriting?
  → Use gpt-4o ($2.50/$10) or claude-sonnet-4-6 ($3/$15)

Is the task complex reasoning, code generation, or multi-step analysis?
  → Use claude-opus-4-6 ($5/$25) or gpt-4.1 ($2/$8)

Is the task simple text generation with low quality bar?
  → Use gpt-4.1-nano ($0.10/$0.40) — cheapest option that's still capable

Real savings example:

A team was using Claude Opus for everything — chat, classify, summarize, extract. Monthly bill: $2,400.

After model tiering:

Chat: kept on Opus ($5/$25) — quality matters here
Classify: moved to GPT-4o-mini ($0.15/$0.60) — 97% cheaper
Summarize: moved to GPT-4o ($2.50/$10) — 60% cheaper
Extract: moved to Gemini Flash ($0.10/$0.40) — 98% cheaper

New monthly bill: $680. 72% reduction, zero quality loss on the tasks that matter.

Step 7: Set Up Ongoing Monitoring (2 minutes)

An audit is worthless if it's a one-time event. You need continuous visibility so costs don't creep back up.

The minimum viable setup:

Provider alerts: Set monthly budget alerts on every provider dashboard. OpenAI, Anthropic, and Google all support email alerts when spend exceeds a threshold.
Per-feature attribution: Tag every AI call with the feature it belongs to. This is the difference between "my OpenAI bill is $500" and "the chat feature costs $300, classify costs $50, and summarize costs $150."
Weekly check-in: Block 10 minutes every Monday morning to glance at your AI costs. Five minutes of prevention beats five hours of "why did our bill spike?"

Track your AI spend automatically with AISpendGuard — tag your calls by feature, route, and customer, then get waste detection alerts and model recommendations. The free tier covers 50,000 events per month.

The 30-Minute Audit Cheat Sheet

Here's the full checklist in one place:

Minute 0-5: Screenshot all provider dashboards, note month-over-month trends
Minute 5-10: Search codebase for all model references, build inventory
Minute 10-13: Check for deprecated/legacy models, flag for migration
Minute 13-20: Calculate cost-per-request for every AI feature
Minute 20-23: Rank features by cost, identify top 3 drivers
Minute 23-28: Evaluate model downgrades for each feature using decision tree
Minute 28-30: Set up provider budget alerts and tag your AI calls

What This Audit Usually Reveals

After running this checklist with dozens of development teams, here's what we consistently find:

30-50% of spend comes from one feature — usually the first one built, using the most expensive model "because it worked"
At least one deprecated model still running in production, often costing more than its replacement
Output tokens are the hidden killer — most teams never set max_tokens and let models generate 3-10x more output than needed
No per-feature attribution — teams know their total bill but can't explain which features drive it
Batch-eligible workloads running in real-time — background tasks (analysis, classification, data enrichment) paying full price when OpenAI's Batch API offers 50% off

The average savings from a first audit: 20-40%. Not from exotic optimization. Just from finding the obvious waste — wrong models, excessive output, and missing caching.

What to Do Next

You've got your audit results. Now prioritize:

This weekend: Switch deprecated models. Set provider budget alerts.
This week: Add max_tokens to your highest-output features. Test cheaper model alternatives for your top cost driver.
This month: Set up per-feature cost attribution with tags. Start tracking cost-per-request as a metric alongside latency and error rate.

AI costs compound. A $0.01 savings per request at 100K requests/month is $1,000/year. The audit takes 30 minutes. The savings last forever.

Start monitoring for free: Sign up for AISpendGuard and see exactly where your AI money goes.

Running this audit on your own project? Share what you found — tag us or join the conversation. We publish new AI cost optimization guides every week.