guideApr 7, 202610 min read

How to Track AI Costs Per Feature (Before Your Bill Becomes a Mystery)

Your AI provider shows total spend. It doesn't show which feature is burning the budget. Here's how to fix that in an afternoon.

Your OpenAI dashboard says you spent $2,340 last month. Great. But which feature caused it?

Was it the chatbot? The document summarizer? That "quick experiment" someone shipped on Friday? The AI-powered search that nobody uses but still runs Claude Opus on every query?

You don't know. And that's the most expensive problem in AI development right now.

Provider dashboards show you what you spent. They don't show you why. And without knowing why, you can't cut costs — you can only hope they go down.

This guide shows you how to set up per-feature cost attribution in an afternoon, so every dollar of AI spend traces back to the feature that caused it.

Why Total Spend Is a Useless Metric

Here's a real scenario. A 6-person SaaS team spends $3,200/month on AI APIs across three providers. Their breakdown looks like this from the provider side:

Provider	Monthly Cost
OpenAI	$2,100
Anthropic	$850
Google Gemini	$250
Total	$3,200

This tells you nothing actionable. You can't optimize "OpenAI." You can only optimize specific features that use OpenAI.

When the same team added per-feature attribution tags, the picture changed completely:

Feature	Model Used	Monthly Cost	% of Total
Document summarizer	GPT-4o	$1,400	43.8%
Customer chatbot	Claude Sonnet 4.6	$720	22.5%
Email draft generator	GPT-4o	$480	15.0%
Search reranking	GPT-4.1 Mini	$340	10.6%
Content classification	Gemini 2.0 Flash	$180	5.6%
Internal admin tools	GPT-4o	$80	2.5%

Now you see it: the document summarizer is 44% of the bill — and it's using GPT-4o ($2.50/$10.00 per 1M tokens) for a task that GPT-4.1 Nano ($0.10/$0.40) handles just as well. That single swap saves $1,330/month.

Without attribution, that insight is invisible.

The Three Dimensions of AI Cost Attribution

Effective cost tracking needs three dimensions, not one:

1. Feature Attribution — What triggered the call?

Every AI API call should carry a tag identifying which product feature initiated it. This is the highest-leverage dimension because it tells you where optimization effort will have the biggest payoff.

Common feature tags:

chatbot, summarizer, search, classifier, email-generator
onboarding-wizard, content-moderation, code-review

2. Route Attribution — Where in the code did it happen?

Map API calls to your application's routes or endpoints. This catches the "ghost calls" — AI requests triggered by background jobs, webhooks, or deprecated endpoints that nobody remembers exist.

Common route tags:

api/chat, api/summarize, api/search
cron/daily-digest, webhook/stripe

3. Customer Attribution — Who caused the cost?

Not all users cost the same. In most AI-powered apps, 5% of users generate 60%+ of AI costs. Tagging by customer tier (free, pro, enterprise) or customer segment lets you catch abuse, right-size your pricing, and identify unprofitable accounts.

Common customer tags:

plan:free, plan:pro, plan:enterprise
segment:power-user, segment:casual

How to Implement Attribution Tags

The implementation is simpler than you think. You're adding metadata to every API call — not changing how the calls work.

Option A: Manual Tagging (Quick Start)

Wrap your AI calls with a tracking function that records the feature, route, and cost metadata alongside the response:

async function trackedCompletion(
  params: ChatCompletionParams,
  tags: { feature: string; route: string; plan?: string }
) {
  const start = Date.now();
  const response = await openai.chat.completions.create(params);

  // Fire-and-forget cost tracking
  trackUsage({
    provider: "openai",
    model: params.model,
    inputTokens: response.usage?.prompt_tokens,
    outputTokens: response.usage?.completion_tokens,
    tags,
    durationMs: Date.now() - start,
  });

  return response;
}

// Usage
const summary = await trackedCompletion(
  { model: "gpt-4.1-nano", messages: [...] },
  { feature: "summarizer", route: "api/summarize", plan: "pro" }
);

This works for small apps with a handful of AI call sites. But it doesn't scale when you have dozens of features across multiple providers.

Option B: Framework Callbacks (Recommended)

If you use LangChain, LiteLLM, or CrewAI, attribution tags plug in at the framework level — no changes to individual calls:

LangChain (Python):

from aispendguard_langchain import AISpendGuardCallback

callback = AISpendGuardCallback(
    api_key="your-key",
    default_tags={
        "feature": "chatbot",
        "route": "api/chat",
        "plan": "pro"
    }
)

chain.invoke(input, config={"callbacks": [callback]})

LiteLLM (Python):

import litellm
from aispendguard_litellm import AISpendGuardLogger

litellm.callbacks = [AISpendGuardLogger(api_key="your-key")]

response = litellm.completion(
    model="gpt-4.1",
    messages=[...],
    metadata={
        "tags": {
            "feature": "email-generator",
            "route": "api/generate-email"
        }
    }
)

The framework approach is better because:

Tags are set once per chain/pipeline, not per call
Token counting and cost calculation happen automatically
You don't modify business logic code

Option C: SDK-Level Tracking (Fastest Setup)

A dedicated cost SDK wraps your existing provider calls and handles attribution automatically:

import { AISpendGuard } from "@aispendguard/sdk";

const guard = new AISpendGuard({ apiKey: "your-key" });

// After any AI call, send the usage event
guard.trackUsage({
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  inputTokens: 1250,
  outputTokens: 340,
  tags: {
    feature: "content-moderation",
    route: "webhook/content-review",
    task_type: "classification"
  }
});

Key principle: Tracking should be fire-and-forget. If your cost tracking fails, your application should still work perfectly. Never couple cost attribution to the critical path of an API call.

What Your Attribution Dashboard Should Show

Once you're tagging every call, you need a dashboard that answers five questions:

1. "Which feature costs the most?"

A simple bar chart or table showing cost per feature tag, sorted descending. This is your optimization hit list — start at the top.

2. "Which feature is growing fastest?"

Total cost matters less than cost trajectory. A feature that costs $50/month but grew 400% last week is a bigger problem than one that costs $500 but is flat.

3. "Which features use the wrong model?"

Cross-reference feature tags with model usage. If your classifier uses GPT-4o but could use Gemini 2.0 Flash, that's a waste recommendation:

Feature	Current Model	Cost/1M Input	Recommended	Cost/1M Input	Savings
Classifier	GPT-4o	$2.50	Gemini 2.0 Flash	$0.10	96%
Summarizer	Claude Opus 4.6	$5.00	GPT-4.1	$2.00	60%
Chatbot	GPT-4o	$2.50	Claude Sonnet 4.6	$3.00	—
Search	GPT-4.1 Mini	$0.40	GPT-4.1 Nano	$0.10	75%

4. "Are free-tier users costing more than they should?"

Filter by customer plan tag. If free users consume 60% of your AI budget, you have a pricing problem — not a cost problem.

5. "What's my cost per user action?"

Divide feature cost by the number of user-initiated actions. If each "summarize document" costs $0.12 and you charge nothing for it, that's unsustainable at scale.

AISpendGuard builds exactly this dashboard for you — unified visibility across all providers, with waste detection and model recommendations out of the box. Start free with 50,000 events/month.

The Tag Taxonomy That Actually Works

Don't overthink your tag structure. Start with three required tags and expand later:

feature:  what product feature triggered this call
route:    which API endpoint or code path
task_type: what kind of AI work (classification, generation,
           summarization, extraction, conversation)

Rules for Good Tags

Use lowercase, hyphenated values — email-generator, not EmailGenerator or Email Generator
Be specific but not granular — chatbot is good, chatbot-v2-experimental-branch is too specific
Keep the vocabulary small — 10-20 feature tags is ideal. More than 50 and you lose signal in noise
Add tags at the call site, not after the fact — retroactive tagging from logs is painful and inaccurate
Never put sensitive data in tags — no user emails, API keys, or prompt content in tag values

Tag Anti-Patterns

Anti-Pattern	Problem	Fix
`feature: "misc"`	Catch-all that hides costs	Create specific tags for each feature
No route tag	Can't find the code that makes the call	Add route tags to every call site
User ID as tag	Cardinality explosion (thousands of unique values)	Use plan tier instead (`plan:free`)
Changing tag names	Breaks historical comparisons	Treat tags as a stable API

Real Savings From Attribution: Three Examples

Example 1: The $800/month Ghost Feature

A developer added an AI-powered "smart search" to their SaaS app six months ago. Usage analytics showed only 12 users per month actually used it. But without cost attribution, nobody knew it was making 40,000 GPT-4o calls/month via background indexing jobs.

Cost before attribution: Hidden inside $2,100/month OpenAI bill Cost after attribution: $800/month identified, feature removed Monthly savings: $800

Example 2: The Model Mismatch

A content platform used Claude Opus 4.6 ($5.00/$25.00 per 1M tokens) for content moderation — a binary classification task. Attribution revealed moderation was 35% of their Anthropic bill.

Switching to Claude Haiku 4.5 ($1.00/$5.00 per 1M tokens) cut moderation costs by 80% with no accuracy loss.

Monthly savings: $620

Example 3: The Free-Tier Drain

A dev tools startup offered AI code review on their free plan. Attribution by customer tier showed free users generated 72% of AI costs but 0% of revenue.

They moved AI code review to the paid tier and added a 5-review/day limit for free users.

Monthly savings: $1,450 (plus improved unit economics)

Getting Started in 30 Minutes

Here's the fastest path from zero to full cost attribution:

Step 1 (5 min): List every place in your codebase that makes an AI API call. Search for openai., anthropic., litellm., or your provider's client.

Step 2 (5 min): Define your tag vocabulary. Write down 5-15 feature names, your API routes, and 3-5 task types.

Step 3 (15 min): Add tracking to each call site. Use a framework callback if you're on LangChain/LiteLLM, or wrap calls with a tracking function.

Step 4 (5 min): Connect to a cost dashboard. Either build your own (you'll need a database, cost calculator, and UI) or use a tool like AISpendGuard that handles ingestion, cost calculation, and visualization automatically.

Step 5 (ongoing): Review your attribution dashboard weekly. Look for the three signals: highest-cost features, fastest-growing features, and model mismatches.

The Bottom Line

The difference between teams that control AI costs and teams that don't isn't the models they use or the caching strategies they deploy. It's whether they know where their money goes.

Provider dashboards give you a total. Attribution gives you a map. And you can't optimize what you can't see.

The median AI-powered app has 3-5 features responsible for 80%+ of costs. Finding them takes an afternoon of tagging. Fixing them — with model swaps, usage limits, or feature deprecation — typically saves 30-60% on the next month's bill.

Stop guessing which feature is burning your budget. Start tagging, start tracking, start saving.

Ready to see where your AI money actually goes? Start tracking with AISpendGuard — free for up to 50,000 events/month. Set up in under 5 minutes with SDKs for TypeScript, LangChain, LiteLLM, and CrewAI.