Documentation

Start From Zero To Hero

Set up workspace, connect SDK, send events, and read savings insights.

How It Works

1

Create workspace + key

In /settings/workspace, create workspace and API key.

2

Instrument your app

Add tags-only tracking in your AI request handlers.

3

Send usage events

Use SDK or direct HTTP to POST /api/ingest.

4

Run rollups

Daily rollups aggregate usage by provider/feature/route/task.

5

Act on savings

Dashboard shows where spend concentrates and what to optimize first.

Limits & Guardrails

  • Required tags: task_type, feature, route.
  • Custom tags are allowed and auto-accepted if key is lowercase snake_case.
  • Custom tag values can be string or string[].
  • Limits: max 24 tags/event, max 16 values in one array tag, max 120 chars/value.
  • Prompt/content/output-like fields are blocked by privacy guard.

task_type Reference

Pick the value that best describes what the model is being asked to produce. The right task_typeis how AISpendGuard knows when you're using a more expensive model than the task actually needs.

ValueWhat it doesTypical outputBest model tierBatch-safe
answerDirect Q&A, RAG-backed responses, knowledge retrieval100–800 tokstandard✗ user-facing
classifyLabel, categorize, detect intent, route to a bucket1–10 tokmicro✓ strong
extractPull structured fields from unstructured text50–300 tokmicro✓ yes
summarizeCondense long content, TLDR, bullet points100–500 tokstandard✓ yes
generateWrite or draft new content, creative writing, ideation300–2000 tokstandard✓ yes
rewriteParaphrase, tone-adjust, edit existing text≈ inputstandard✓ yes
translateTranslate between languages≈ inputmicro✓ yes
codeGenerate, complete, review, or explain code200–1500 tokpremium✓ yes
evalLLM-as-judge, quality scoring, test assertions10–50 tokmicrobest candidate
embedText embedding / vector generationfixed vectorembedding✓ strong
routeDecide which tool, agent, or path to take next1–20 tokmicro✓ yes
planDecompose a goal into subtasks, strategy reasoning100–500 tokpremium✓ yes
agent_stepSingle step inside a multi-step agent loop50–800 tokvariesusually ✗
visionUnderstand images, PDFs, screenshots (multimodal)100–600 tokstandard✓ yes
chatMulti-turn stateful conversation (not one-shot Q&A)100–500 tokstandard✗ real-time
otherNone of the above — avoid, reduces waste detection quality

Model tiers

microhaiku / gpt-4o-mini / flash-lite — short output, high volume, 80–95% cheaper than premium
standardsonnet / gpt-4o / flash — versatile, best quality/cost ratio for most tasks
premiumopus / o1 / o3 / gpt-4-turbo — complex reasoning, nuanced code, agent planning
embeddingtext-embedding-3-small / embed-english-v3 — vectors only, never chat models
Waste rule triggered by task_type: if task_type is classify, route, or eval and you are using a premium model with average output under 100 tokens — AISpendGuard will flag this as overspend and show the exact monthly saving from switching to micro tier.

Extended Token Fields

These optional fields enable accurate cost calculation and cost-spike detection. Provider helpers extract them automatically — pass response.usage and they are captured for you.

FieldWhat it tracksProvider source
resolvedModelstringPinned model version returned by provider (e.g. gpt-4o-mini-2024-07-18)
⚠ Silent upgrades go undetected; price lookup uses alias
response.modelmessage.modelresponse.modelVersion
inputTokensCachedintegerCache read hits — subset of inputTokens, billed cheaper (OpenAI 0.5×, Anthropic 0.1×)
⚠ Spend overstated on cached calls; cache ROI invisible
prompt_tokens_details.cached_tokenscache_read_input_tokenscachedContentTokenCount
inputTokensCacheWriteinteger · Anthropic onlyCache write cost — subset of inputTokens, billed at 1.25× base input price
⚠ Spend understated when building Anthropic prompt cache
cache_creation_input_tokens
thinkingTokensintegerReasoning/thinking tokens — subset of outputTokens. Can be 3–10× the visible output on o1/o3/Gemini 2.5
⚠ Cost spikes from reasoning-heavy calls are invisible
completion_tokens_details.reasoning_tokensoutput_tokens_details.reasoning_tokensusageMetadata.thoughtsTokenCount
Anthropic extended thinking note: When using claude-3-7-sonnet with thinking: { type: "enabled" }, thinking tokens are included in output_tokens but are not reported separately in the usage object. To track them, count content blocks of type "thinking" manually and pass the token count via thinkingTokens.

OpenAI (Real SDK Integration)

import OpenAI from "openai";
import { init, trackUsage, createOpenAIUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });
const startedAt = Date.now();

const response = await openai.responses.create({
  model: "gpt-4o-mini",
  input: "Classify this message: 'I want to cancel my subscription'"
});

const event = createOpenAIUsageEvent({
  model: "gpt-4o-mini",
  resolvedModel: response.model,       // e.g. "gpt-4o-mini-2024-07-18"
  usage: response.usage,               // auto-extracts tokens, cache hits, reasoning tokens
  latencyMs: Date.now() - startedAt,
  costUsd: 0.0021, // optional if you have pricing calc
  tags: {
    task_type: "classify",
    feature: "ticket_triage",
    route: "POST /api/support/triage",
    customer_plan: "free",
    environment: "prod"
  }
});

await trackUsage(event);

Anthropic (Real SDK Integration)

import Anthropic from "@anthropic-ai/sdk";
import { init, trackUsage, createAnthropicUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
const startedAt = Date.now();

const message = await anthropic.messages.create({
  model: "claude-3-5-sonnet-latest",
  max_tokens: 200,
  messages: [{ role: "user", content: "Summarize this support case in 3 bullet points." }]
});

const event = createAnthropicUsageEvent({
  model: "claude-3-5-sonnet-latest",
  resolvedModel: message.model,        // e.g. "claude-3-5-sonnet-20241022"
  usage: message.usage,                // auto-extracts tokens, cache_read, cache_creation
  latencyMs: Date.now() - startedAt,
  costUsd: 0.0081, // optional if you have pricing calc
  tags: {
    task_type: "summarize",
    feature: "support_summary",
    route: "POST /api/support/summary",
    customer_plan: "pro",
    environment: "prod"
  }
});

await trackUsage(event);

Gemini (Real SDK Integration)

import { GoogleGenAI } from "@google/genai";
import { init, trackUsage, createGeminiUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
const startedAt = Date.now();

const response = await ai.models.generateContent({
  model: "gemini-2.0-flash",
  contents: [{ role: "user", parts: [{ text: "Translate 'Hello world' to French." }] }]
});

const event = createGeminiUsageEvent({
  model: "gemini-2.0-flash",
  resolvedModel: response.modelVersion,  // e.g. "gemini-2.0-flash-001"
  usage: response.usageMetadata,         // auto-extracts tokens, cachedContent, thoughts
  latencyMs: Date.now() - startedAt,
  tags: {
    task_type: "translate",
    feature: "ui_i18n",
    route: "POST /api/translate",
    environment: "prod"
  }
});

await trackUsage(event);

OpenRouter

OpenRouter is OpenAI-compatible, so the existing createOpenAIUsageEvent() helper works out of the box. Set provider: "openrouter" in tags for attribution.

Option A: SDK Integration

import OpenAI from "openai";
import { init, trackUsage, createOpenAIUsageEvent } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  endpoint: "https://www.aispendguard.com/api/ingest",
});

const openrouter = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY!,
});

const startedAt = Date.now();

const response = await openrouter.chat.completions.create({
  model: "anthropic/claude-sonnet-4-20250514",
  messages: [{ role: "user", content: "Hello" }],
});

const event = createOpenAIUsageEvent({
  model: "anthropic/claude-sonnet-4-20250514",
  resolvedModel: response.model,
  usage: response.usage,
  latencyMs: Date.now() - startedAt,
  tags: {
    provider: "openrouter",  // Override provider to "openrouter"
    task_type: "chat",
    feature: "support",
    route: "/api/chat",
  }
});

await trackUsage(event);

Option B: LiteLLM (Python)

If you use LiteLLM with the openrouter/ model prefix, our aispendguard-litellm integration auto-detects OpenRouter and tracks all calls.

import litellm
from aispendguard_litellm import AISpendGuardLogger

litellm.callbacks.append(
    AISpendGuardLogger(api_key="asg_...",
                       default_tags={"feature": "api", "route": "/chat"})
)

response = litellm.completion(
    model="openrouter/anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}],
    metadata={"aispendguard_tags": {"task_type": "chat"}},
)

Option C: Broadcast Webhook (Zero-Code)

OpenRouter can push all your usage telemetry directly to AISpendGuard via its Broadcast feature. No SDK, no code changes — just configure the webhook and you're done.

// Zero-code setup — no SDK needed!
// 1. Copy your AISpendGuard API key from /settings/workspace
// 2. In OpenRouter → Settings → Broadcast → Add Webhook:
//    URL:  https://www.aispendguard.com/api/ingest/otlp
//    Auth: Authorization: Bearer asg_your_api_key
// 3. Enable privacy mode (recommended)
// 4. Done — all OpenRouter usage flows to AISpendGuard automatically

Privacy mode strips prompt/response content. Even without it, AISpendGuard automatically strips all prompt content — only cost, tokens, and metadata are stored.

Python (HTTP)

import requests
from datetime import datetime, timezone

url = "https://www.aispendguard.com/api/ingest"
api_key = "asg_your_api_key"

payload = {
    "events": [
        {
            "event_id": "evt_123",
            "provider": "openai",
            "model": "gpt-4o-mini",
            "input_tokens": 120,
            "output_tokens": 12,
            "latency_ms": 840,
            "cost_usd": 0.0021,
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "tags": {
                "task_type": "classify",
                "feature": "lead_classifier",
                "route": "POST /api/ai/classify",
                "customer_plan": "free",
                "environment": "prod",
                "customer_defined_1": ["value1", "value2"],
                "customer_defined_2": ["service1", "service2"]
            }
        }
    ]
}

res = requests.post(url, json=payload, headers={"x-api-key": api_key})
print(res.status_code, res.json())

Go (HTTP)

package main

import (
  "bytes"
  "fmt"
  "net/http"
)

func main() {
  payload := []byte(`{
    "events": [{
      "event_id": "evt_123",
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_tokens": 120,
      "output_tokens": 12,
      "latency_ms": 840,
      "cost_usd": 0.0021,
      "timestamp": "2026-03-04T12:00:00Z",
      "tags": {
        "task_type": "classify",
        "feature": "lead_classifier",
        "route": "POST /api/ai/classify",
        "customer_plan": "free",
        "environment": "prod",
        "customer_defined_1": ["value1", "value2"],
        "customer_defined_2": ["service1", "service2"]
      }
    }]
  }`)

  req, _ := http.NewRequest("POST", "https://www.aispendguard.com/api/ingest", bytes.NewBuffer(payload))
  req.Header.Set("Content-Type", "application/json")
  req.Header.Set("x-api-key", "asg_your_api_key")

  resp, err := http.DefaultClient.Do(req)
  if err != nil { panic(err) }
  defer resp.Body.Close()
  fmt.Println("status:", resp.StatusCode)
}

cURL

curl -X POST https://www.aispendguard.com/api/ingest \
  -H "Content-Type: application/json" \
  -H "x-api-key: asg_your_api_key" \
  -d '{
    "events": [{
      "event_id": "evt_123",
      "provider": "openai",
      "model": "gpt-4o-mini",
      "input_tokens": 120,
      "output_tokens": 12,
      "latency_ms": 840,
      "cost_usd": 0.0021,
      "timestamp": "2026-03-04T12:00:00Z",
      "tags": {
        "task_type": "classify",
        "feature": "lead_classifier",
        "route": "POST /api/ai/classify",
        "customer_plan": "free",
        "environment": "prod",
        "customer_defined_1": ["value1", "value2"],
        "customer_defined_2": ["service1", "service2"]
      }
    }]
  }'

Auto-Wrap (Zero-Code Tracking)

Wrap your AI client once — every call is tracked automatically. No manual trackUsage() needed.

OpenAI

import OpenAI from "openai";
import { init, wrapOpenAI } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  defaultTags: {
    feature: "chatbot",
    route: "POST /api/chat",
    environment: "prod",
  },
});

const openai = wrapOpenAI(new OpenAI());

// Every call is now tracked automatically:
const res = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
  // Override tags per call (optional):
  asgTags: { task_type: "chat", customer_plan: "pro" },
});

Anthropic

import Anthropic from "@anthropic-ai/sdk";
import { init, wrapAnthropic } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  defaultTags: { feature: "support", route: "POST /api/support" },
});

const anthropic = wrapAnthropic(new Anthropic());

const msg = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 200,
  messages: [{ role: "user", content: "Summarize this ticket" }],
  asgTags: { task_type: "summarize" },
});

Gemini

import { GoogleGenAI } from "@google/genai";
import { init, wrapGemini } from "@aispendguard/sdk";

init({
  apiKey: process.env.AISPENDGUARD_API_KEY!,
  defaultTags: { feature: "translate", route: "POST /api/translate" },
});

const model = wrapGemini(
  ai.models,       // pass the models object
  "gemini-2.0-flash" // model name is required
);

const res = await model.generateContent({
  contents: [{ role: "user", parts: [{ text: "Translate to French" }] }],
  asgTags: { task_type: "translate" },
});

defaultTags

Tags passed to init({ defaultTags }) are merged into every auto-wrapped call. Per-call asgTags override defaults. Use this to set feature, route, and environment once instead of repeating them.

LangChain.js Integration

The SDK includes a LangChain.js callback handler that tracks every LLM call automatically. Works with any LangChain-supported provider (OpenAI, Anthropic, Google, Mistral, Cohere, etc.).

import { ChatOpenAI } from "@langchain/openai";
import { init, AISpendGuardCallbackHandler } from "@aispendguard/sdk";

init({ apiKey: process.env.AISPENDGUARD_API_KEY! });

const handler = new AISpendGuardCallbackHandler({
  defaultTags: {
    feature: "rag_pipeline",
    route: "POST /api/ask",
  },
});

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  callbacks: [handler],
});

// Or pass to any chain/agent:
const result = await chain.invoke(
  { input: "..." },
  { callbacks: [handler] }
);

The handler auto-detects the provider, model, and token usage from LangChain's callback data. It never reads prompts or outputs — only metadata is tracked. Events are deduplicated by LangChain run ID.

For LangChain Python, install pip install aispendguard-langchain — see the aispendguard-langchain repo.

Python SDK

Native Python SDK with batched transport, provider helpers, and the same validation rules as the TypeScript SDK.

pip install aispendguard
from aispendguard import AISpendGuard, create_openai_event
import openai, time

client = AISpendGuard(api_key="asg_your_key_here")

openai_client = openai.OpenAI()
start = time.time()

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Classify: 'I want to cancel'"}],
)

event = create_openai_event(
    model="gpt-4o-mini",
    usage=response.usage,
    latency_ms=int((time.time() - start) * 1000),
    tags={
        "task_type": "classify",
        "feature": "ticket_triage",
        "route": "POST /api/classify",
    },
)
client.track(event)

Also supports create_anthropic_event and create_gemini_event. See the full docs at github.com/AISpendGuard/aispendguard-python.

OpenTelemetry (OTLP)

If you already instrument with OpenTelemetry, send GenAI traces directly — no SDK needed.

POST https://www.aispendguard.com/api/otel/v1/traces
Content-Type: application/json
Authorization: Bearer asg_your_api_key

# Standard OTLP/HTTP JSON format with GenAI semantic conventions:
#   gen_ai.system          → provider (openai, anthropic, google)
#   gen_ai.request.model   → model name
#   gen_ai.usage.input_tokens
#   gen_ai.usage.output_tokens

# AISpendGuard-specific attributes (optional):
#   asg.task_type, asg.feature, asg.route — override tags
#   asg.*                  → custom tags (prefix stripped)

# Headers for default tags:
#   x-asg-feature: my_feature
#   x-asg-route: POST /api/endpoint

Works with any OTLP-compatible instrumentation (OpenLLMetry, Traceloop, custom spans). Prompt content and model outputs are automatically stripped — only token counts and metadata are stored.

Budget Alerts

Set a monthly USD spending cap and get email alerts at 75% and 90% thresholds. Available on the Pro plan.

Setup via Dashboard

Go to /billing → Budget panel → set your monthly limit, enable alert thresholds, and add an alert email address.

Setup via API

# Create or update budget (requires Clerk session, OWNER/ADMIN role)
POST /api/budgets
Content-Type: application/json

{
  "monthlyLimitUsd": 500,
  "alertAt75": true,
  "alertAt90": true,
  "alertEmail": "alerts@yourcompany.com"
}

# Check current budget
GET /api/budgets

# Remove budget
DELETE /api/budgets

Alerts fire in real-time at ingest — as soon as your spend crosses a threshold, you get an email. Each threshold fires once per calendar month (deduplicated automatically). Spend is calculated from both rolled-up daily aggregates and unprocessed events for accuracy.

Ingest Response

Every POST /api/ingest call returns a JSON object with these fields:

{
  "accepted": 1,
  "duplicates": 0,
  "rejected": 0,
  "event_ids": ["cm...abc"],       // IDs of accepted events
  "warnings": [                     // Non-critical issues (event still accepted)
    "events[0].tags.task_type \"lab-benchmark\" is not recognized — coerced to \"other\"",
    "events[0].tags.my-key is not a supported tag key — stripped"
  ],
  "enforcement": {                  // Budget status (event always accepted)
    "action": "block",              // "none" | "warn" | "block"
    "reason": "workspace_budget_exceeded",
    "budget_limit": 50.00,
    "current_spend": 52.34
  },
  "dashboard_url": "https://www.aispendguard.com/events",
  "usage": {
    "eventsThisMonth": 1234,
    "monthlyLimit": 50000,
    "tier": "FREE"
  }
}

Use event_ids to verify your events were stored. Open dashboard_url to see them live.

Warnings: Tag validation issues (invalid task_type, unsupported tag keys, values too long) produce warnings — events are always accepted with best-effort coercion. Only privacy violations (forbidden keys like prompt, message) cause hard rejection.

Enforcement: When your workspace exceeds its budget, events are still tracked and accepted. The enforcement field signals the budget status so your code can decide whether to continue sending requests.

Error Reference

StatusMeaningFix
400Invalid request body or privacy violationCheck errors[] in response — invalid JSON, forbidden keys (prompt, message, etc.), or structurally unparseable events. Tag validation issues return 200 with warnings[] instead.
401Missing or invalid API keyPass x-api-key header or Authorization: Bearer <key>. Check key is not revoked.
429Rate limit or monthly event limit reachedRate limit: 120 req/min per key, wait and retry. Event limit: check usage.eventsThisMonth — upgrade to PRO for 500K/mo.
500Server errorRetry with exponential backoff. If persistent, check status page.

Error Response Shape (400)

{
  "accepted": 0,
  "duplicates": 0,
  "rejected": 2,
  "errors": [
    "events[0] contains forbidden field: prompt",
    "events[1] invalid provider: must be a non-empty string"
  ]
}

Troubleshooting

Events accepted but not on dashboard?

Dashboard reads from daily rollups which update on each cron run. New events appear after the next rollup cycle. Check /events page for raw events — they appear immediately.

SDK trackUsage() doesn't throw but events missing?

By default the SDK is fire-and-forget — errors are logged to console, not thrown. Use init({ strict: true }) to throw on failures, then check the error message.

Getting 401 with a valid key?

Check if the key was revoked in Settings. Generate a new one if needed. Ensure you pass the full key string (starts with asg_).

Duplicate events being skipped?

Events are deduplicated by event_id or a content hash (provider + model + tokens + timestamp). Use unique event_id values per call, or omit it to let the server generate one.

Cost showing as $0?

Cost is calculated server-side from model pricing. If the model isn't in our pricing database, cost will be $0. Pass cost_usd in the event to override. Check /model-prices for supported models.

Streaming Responses

With streaming, usage data arrives in the final chunk, not during the stream. Accumulate the stream, then track usage after it completes.

OpenAI Streaming

import time
start = time.time()

# Must pass stream_options to get usage in stream
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
    stream_options={"include_usage": True},
)

usage = None
for chunk in stream:
    if chunk.usage:
        usage = chunk.usage
    # ... process chunk.choices[0].delta

if usage:
    event = create_openai_event(
        model="gpt-4o-mini",
        usage=usage,
        latency_ms=int((time.time() - start) * 1000),
        tags={"task_type": "chat", "feature": "assistant", "route": "POST /api/chat"},
    )
    asg.track(event)

Anthropic Streaming

with anthropic_client.messages.stream(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=200,
) as stream:
    for text in stream.text_stream:
        pass  # process text chunks
    message = stream.get_final_message()

# Usage is on the final message object
event = create_anthropic_event(
    model="claude-sonnet-4-20250514",
    usage=message.usage,
    latency_ms=latency,
    tags={"task_type": "chat", "feature": "assistant", "route": "POST /api/chat"},
)
asg.track(event)

TypeScript SDK (OpenAI)

const stream = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
  stream_options: { include_usage: true },
});

let usage;
for await (const chunk of stream) {
  if (chunk.usage) usage = chunk.usage;
  // ... process chunk
}

if (usage) {
  const event = createOpenAIUsageEvent({
    model: "gpt-4o-mini",
    usage,
    latencyMs: Date.now() - start,
    tags: { task_type: "chat", feature: "assistant", route: "POST /api/chat" },
  });
  await trackUsage(event);
}

Data Export API

Export your usage events as JSON or CSV for internal dashboards, reporting, or analysis. Requires Clerk authentication (session cookie).

# JSON (default)
GET /api/export?from=2026-03-01&to=2026-04-01&limit=5000

# CSV
GET /api/export?format=csv&from=2026-03-01&to=2026-04-01

# Parameters:
#   from    — start date (default: 1st of current month)
#   to      — end date (default: 1st of next month)
#   format  — "json" (default) or "csv"
#   limit   — max rows, 1–10000 (default: 1000)