Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain
Here's a number that should scare you: $12,000.
That's what one developer reported paying for a single runaway LangChain recursive chain. No monitoring, no iteration limits, no cost visibility. The agent just kept calling the LLM until the API key hit its spending limit.
Agent frameworks like LangChain, CrewAI, and AutoGen are powerful. They're also unpredictable — because the agent decides how many LLM calls to make, not you.
The Problem: Agents Control the Loop Count
In a traditional API integration, you control the cost:
1 user action → 1 API call → predictable cost
With agents, the user action triggers a thinking loop:
1 user action → agent plans → calls LLM → evaluates → calls tools
→ calls LLM again → evaluates → calls more tools → calls LLM again
→ ... → final answer
Each iteration is a separate LLM call. A simple question might take 3 iterations. A complex one might take 15. A poorly designed prompt might loop indefinitely.
Real Cost Numbers
| Scenario | Iterations | Model | Cost per Run |
|---|---|---|---|
| Simple ReAct agent | 3-5 | GPT-4o | $0.15-0.25 |
| Research agent with tools | 8-12 | GPT-4o | $0.40-0.60 |
| Multi-agent crew (CrewAI) | 15-30 | GPT-4o | $0.75-1.50 |
| Runaway recursive chain | 50+ | GPT-4o | $2.50+ |
At 1,000 tasks per day with a multi-agent crew: $750-1,500/month.
The cost is fundamentally unpredictable because the agent decides the loop count based on the input. Two similar-looking requests can cost 5x different amounts.
Step 1: Set Hard Iteration Limits
Every framework supports this. There's no reason to skip it.
LangChain
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=10, # Hard stop at 10 iterations
max_execution_time=30, # Hard stop at 30 seconds
early_stopping_method="generate", # Generate a final answer, don't just stop
)
CrewAI
from crewai import Agent
researcher = Agent(
role="Research Analyst",
goal="Find relevant data",
max_iter=8, # Max 8 iterations per task
max_rpm=10, # Rate limit: 10 requests per minute
)
Default limits are often too high or nonexistent. LangChain's default max_iterations is 15. For most use cases, 5-8 is sufficient. Set it explicitly.
Step 2: Track Cost Per Agent Run
Iteration limits prevent runaway loops, but they don't tell you what each run costs. For that, you need per-run cost tracking.
The trace_id Pattern
Assign a unique trace_id to each agent invocation. Tag every LLM call within that run with the same trace_id. After the run, sum the costs.
import uuid
from aispendguard import track_usage
# Generate a unique trace ID for this agent run
trace_id = str(uuid.uuid4())
# In your LangChain callback:
class CostTrackingCallback(BaseCallbackHandler):
def on_llm_end(self, response, **kwargs):
usage = response.llm_output.get("token_usage", {})
track_usage(
model=response.llm_output.get("model_name", "gpt-4o"),
tokens_in=usage.get("prompt_tokens", 0),
tokens_out=usage.get("completion_tokens", 0),
tags={
"trace_id": trace_id,
"feature": "research-agent",
"iteration": str(self.iteration_count),
},
)
self.iteration_count += 1
Now you can answer: "What did this specific agent run cost?" and "What's my average cost per run?"
What to Track
| Tag | Why |
|---|---|
trace_id | Groups all LLM calls in one agent run |
feature | Which agent/workflow triggered this |
iteration | Which step in the loop (identifies expensive steps) |
model | Which model was used (agents may use different models per step) |
user_id | Cost attribution per user |
Step 3: Use Cheaper Models for Agent Reasoning
Not every agent iteration needs GPT-4o. The planning and evaluation steps often work fine with GPT-4o-mini.
Model Routing in Agents
# Planning step: cheap model is fine
planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Execution step with complex reasoning: expensive model
executor_llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Final synthesis: cheap model is fine
synthesizer_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
In a 10-iteration agent run, if 7 iterations use GPT-4o-mini and 3 use GPT-4o, you cut the cost by roughly 60% compared to using GPT-4o for everything.
CrewAI Multi-Model Setup
from crewai import Agent
# Research tasks: need the powerful model
researcher = Agent(
role="Researcher",
llm="gpt-4o",
max_iter=5,
)
# Writing tasks: cheaper model works
writer = Agent(
role="Writer",
llm="gpt-4o-mini",
max_iter=3,
)
Step 4: Monitor and Alert
Once you're tracking per-run costs, set alerts for anomalies:
- Per-run threshold: Alert if any single agent run exceeds $1.00
- Daily budget: Alert if total agent spend exceeds $50/day
- Iteration anomaly: Alert if an agent run exceeds 10 iterations (suggests a loop problem)
Budget Alert Example
# After each agent run, check the cost
run_cost = sum_costs_by_trace_id(trace_id)
if run_cost > 1.00:
notify_slack(f"Agent run {trace_id} cost ${run_cost:.2f} — investigate")
if run_cost > 5.00:
disable_agent() # Kill switch for runaway costs
notify_pagerduty(f"Agent cost emergency: ${run_cost:.2f}")
The Cost Optimization Checklist for Agents
| Action | Effort | Savings |
|---|---|---|
Set max_iterations explicitly | 5 minutes | Prevents $1,000+ runaway loops |
| Track cost per run with trace_id | 30 minutes | Enables all other optimizations |
| Route cheap steps to GPT-4o-mini | 1 hour | 40-60% per run |
| Set budget alerts | 15 minutes | Prevents surprise bills |
| Cache repeated tool outputs | 1-2 hours | 20-40% on tool-heavy agents |
| Limit context window per iteration | 30 minutes | 20-30% on history-heavy agents |
The Bottom Line
Agent frameworks are the fastest-growing source of AI API costs. The combination of unpredictable iteration counts, expensive models, and no built-in cost visibility creates the perfect conditions for bill shock.
The fix isn't complicated:
- Set hard limits on iterations
- Track cost per run
- Use cheaper models where quality allows
- Alert on anomalies before they become invoices
We built AISpendGuard with agent cost tracking as a core use case. The LangChain and CrewAI integrations automatically track per-run costs with trace_id grouping — no custom callbacks needed.
Free tier: 50,000 events/month. No credit card required.
Start tracking agent costs at aispendguard.com
Pricing reflects OpenAI and Anthropic rates as of March 2026. Agent iteration counts are based on real-world usage patterns reported by developers on Reddit, HN, and the LangChain Discord.