A Hacker News user lost $187 in 10 minutes to a runaway agent loop. A startup's CrewAI pipeline burned $4,800 in a month with no one noticing which step was responsible. These stories repeat daily across r/OpenAI, r/ClaudeAI, and the LangChain Discord.
The problem isn't AI agents. It's that no tool tells you which step costs what, when costs spike, or how to stop runaway spend before it drains your budget.
This week we shipped four features that change that.
1. Agent Runaway Detection
The problem: An agent enters a retry loop. Each retry costs $0.05–$0.50. Ten retries in 3 minutes = budget gone. No error. No timeout. Just a perfectly functioning agent doing what it was told — repeatedly.
What we built: AISpendGuard now detects traces with 10+ events in under 5 minutes and flags them as potential runaways. You get:
- Automatic identification of retry-loop patterns
- Monthly waste estimate ("this pattern costs you ~$X/month if it continues")
- Actionable advice: set max retries, add exponential backoff, use a cheaper retry model
Why it matters: No competitor detects this. LangSmith, Langfuse, and Helicone track costs — but none of them identify runaway patterns and tell you what to do about them.
2. Agent Session Spend Caps
The problem: You want each agent run to cost at most $2. But the agent decides how many LLM calls to make. You need a kill switch — at the SDK level, not in the dashboard.
What we built: SessionBudget in the AISpendGuard SDK lets you set soft and hard limits per agent session:
import { SessionBudget } from '@aispendguard/sdk';
const budget = new SessionBudget({
maxBudget: 2.00, // Hard limit: $2.00
softLimitPercent: 75, // Warn at 75% ($1.50)
onSoftLimit: (info) => console.warn(`Agent spent $${info.currentSpendUsd} of $${info.maxBudgetUsd}`),
onHardLimit: (info) => { throw new Error(`Budget exceeded: $${info.currentSpendUsd}`); }
});
The SDK tracks cumulative cost per session and fires callbacks at your thresholds. No gateway required — it works alongside your existing code.
Competitive context: AgentBudget, Portkey, and Lava are marketing similar capabilities. Our approach is SDK-cooperative (no proxy lock-in) and works with any provider.
3. Graduated Budget Enforcement
The problem: Binary enforcement (alert or block) is too blunt. You want to warn your team at 50%, throttle at 80%, suggest cheaper models at 95%, and only hard-block at 100%.
What we built: The first graduated enforcement ladder in the market:
| Budget % | Action | What happens |
|---|---|---|
| 50% | Warn | Dashboard + email notification |
| 80% | Throttle | SDK receives slowdown signal |
| 95% | Suggest downgrade | SDK receives model-switch suggestion |
| 100% | Block | SDK receives reject signal |
Actions are configurable per threshold. Works with both workspace budgets and tag-level budgets (per team, feature, or route).
Key principle: Your dashboard numbers are always accurate. We track everything — we just tell the SDK to slow down. Events are never dropped.
Competitive gap: Portkey and LiteLLM offer alert-only or hard-block. Nobody else offers a configurable ladder.
4. Per-Span Cost Breakdown in Trace View
The problem: Your agent pipeline costs $4.80 per run. But which step is expensive — the planning call? The retrieval? The synthesis? Without per-step attribution, you're optimizing blind.
What we built: The trace view now breaks down cost per span:
- Cost distribution bar — visual breakdown showing which spans consume the most budget
- Per-span detail — model, tokens, cost, and duration for every step in the trace
- Sort by cost — find the expensive spans instantly
This is the feature agent builders have been asking for. When you see that your "planning" step costs 60% of the run while the "execution" step costs 5%, you know exactly where to optimize.
Competitive context: Braintrust shipped per-span cost in March 2026. We match that capability at 1/13th the price (€19/mo vs $249/mo).
The Full Picture
These four features work together:
- Runaway detection catches loops you didn't know about
- Session caps prevent any single run from exceeding your budget
- Graduated enforcement gives you a smooth ramp from warning to blocking
- Per-span cost shows you exactly where to optimize
No other tool combines all four. And none of them require a proxy gateway — our SDK is passive, adds zero latency, and works with OpenAI, Anthropic, Google, and every major provider.
Try It Now
All four features are live today.
- Free tier: 50,000 events/month, no credit card required
- Pro: €19/month for 500,000 events and full enforcement features
If you're building AI agents with LangChain, CrewAI, or custom loops, these features were built for you.
Covers features: Agent Runaway Detection (F29), Agent Session Spend Caps (F35), Graduated Budget Enforcement (F33), Per-Span Cost Breakdown (F36).