guideMar 22, 20266 min read

Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain

One user request can trigger 15+ LLM calls. Here's how to see what each agent run actually costs — and how to set limits before the bill arrives.

Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain

Here's a number that should scare you: $12,000.

That's what one developer reported paying for a single runaway LangChain recursive chain. No monitoring, no iteration limits, no cost visibility. The agent just kept calling the LLM until the API key hit its spending limit.

Agent frameworks like LangChain, CrewAI, and AutoGen are powerful. They're also unpredictable — because the agent decides how many LLM calls to make, not you.

The Problem: Agents Control the Loop Count

In a traditional API integration, you control the cost:

1 user action → 1 API call → predictable cost

With agents, the user action triggers a thinking loop:

1 user action → agent plans → calls LLM → evaluates → calls tools
→ calls LLM again → evaluates → calls more tools → calls LLM again
→ ... → final answer

Each iteration is a separate LLM call. A simple question might take 3 iterations. A complex one might take 15. A poorly designed prompt might loop indefinitely.

Real Cost Numbers

Scenario	Iterations	Model	Cost per Run
Simple ReAct agent	3-5	GPT-4o	$0.15-0.25
Research agent with tools	8-12	GPT-4o	$0.40-0.60
Multi-agent crew (CrewAI)	15-30	GPT-4o	$0.75-1.50
Runaway recursive chain	50+	GPT-4o	$2.50+

At 1,000 tasks per day with a multi-agent crew: $750-1,500/month.

The cost is fundamentally unpredictable because the agent decides the loop count based on the input. Two similar-looking requests can cost 5x different amounts.

Step 1: Set Hard Iteration Limits

Every framework supports this. There's no reason to skip it.

LangChain

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,       # Hard stop at 10 iterations
    max_execution_time=30,   # Hard stop at 30 seconds
    early_stopping_method="generate",  # Generate a final answer, don't just stop
)

CrewAI

from crewai import Agent

researcher = Agent(
    role="Research Analyst",
    goal="Find relevant data",
    max_iter=8,              # Max 8 iterations per task
    max_rpm=10,              # Rate limit: 10 requests per minute
)

Default limits are often too high or nonexistent. LangChain's default max_iterations is 15. For most use cases, 5-8 is sufficient. Set it explicitly.

Step 2: Track Cost Per Agent Run

Iteration limits prevent runaway loops, but they don't tell you what each run costs. For that, you need per-run cost tracking.

The trace_id Pattern

Assign a unique trace_id to each agent invocation. Tag every LLM call within that run with the same trace_id. After the run, sum the costs.

import uuid
from aispendguard import track_usage

# Generate a unique trace ID for this agent run
trace_id = str(uuid.uuid4())

# In your LangChain callback:
class CostTrackingCallback(BaseCallbackHandler):
    def on_llm_end(self, response, **kwargs):
        usage = response.llm_output.get("token_usage", {})
        track_usage(
            model=response.llm_output.get("model_name", "gpt-4o"),
            tokens_in=usage.get("prompt_tokens", 0),
            tokens_out=usage.get("completion_tokens", 0),
            tags={
                "trace_id": trace_id,
                "feature": "research-agent",
                "iteration": str(self.iteration_count),
            },
        )
        self.iteration_count += 1

Now you can answer: "What did this specific agent run cost?" and "What's my average cost per run?"

What to Track

Tag	Why
`trace_id`	Groups all LLM calls in one agent run
`feature`	Which agent/workflow triggered this
`iteration`	Which step in the loop (identifies expensive steps)
`model`	Which model was used (agents may use different models per step)
`user_id`	Cost attribution per user

Step 3: Use Cheaper Models for Agent Reasoning

Not every agent iteration needs GPT-4o. The planning and evaluation steps often work fine with GPT-4o-mini.

Model Routing in Agents

# Planning step: cheap model is fine
planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Execution step with complex reasoning: expensive model
executor_llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Final synthesis: cheap model is fine
synthesizer_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

In a 10-iteration agent run, if 7 iterations use GPT-4o-mini and 3 use GPT-4o, you cut the cost by roughly 60% compared to using GPT-4o for everything.

CrewAI Multi-Model Setup

from crewai import Agent

# Research tasks: need the powerful model
researcher = Agent(
    role="Researcher",
    llm="gpt-4o",
    max_iter=5,
)

# Writing tasks: cheaper model works
writer = Agent(
    role="Writer",
    llm="gpt-4o-mini",
    max_iter=3,
)

Step 4: Monitor and Alert

Once you're tracking per-run costs, set alerts for anomalies:

Per-run threshold: Alert if any single agent run exceeds $1.00
Daily budget: Alert if total agent spend exceeds $50/day
Iteration anomaly: Alert if an agent run exceeds 10 iterations (suggests a loop problem)

Budget Alert Example

# After each agent run, check the cost
run_cost = sum_costs_by_trace_id(trace_id)

if run_cost > 1.00:
    notify_slack(f"Agent run {trace_id} cost ${run_cost:.2f} — investigate")

if run_cost > 5.00:
    disable_agent()  # Kill switch for runaway costs
    notify_pagerduty(f"Agent cost emergency: ${run_cost:.2f}")

The Cost Optimization Checklist for Agents

Action	Effort	Savings
Set `max_iterations` explicitly	5 minutes	Prevents $1,000+ runaway loops
Track cost per run with trace_id	30 minutes	Enables all other optimizations
Route cheap steps to GPT-4o-mini	1 hour	40-60% per run
Set budget alerts	15 minutes	Prevents surprise bills
Cache repeated tool outputs	1-2 hours	20-40% on tool-heavy agents
Limit context window per iteration	30 minutes	20-30% on history-heavy agents

The Bottom Line

Agent frameworks are the fastest-growing source of AI API costs. The combination of unpredictable iteration counts, expensive models, and no built-in cost visibility creates the perfect conditions for bill shock.

The fix isn't complicated:

Set hard limits on iterations
Track cost per run
Use cheaper models where quality allows
Alert on anomalies before they become invoices

We built AISpendGuard with agent cost tracking as a core use case. The LangChain and CrewAI integrations automatically track per-run costs with trace_id grouping — no custom callbacks needed.

Free tier: 50,000 events/month. No credit card required.

Start tracking agent costs at aispendguard.com

Pricing reflects OpenAI and Anthropic rates as of March 2026. Agent iteration counts are based on real-world usage patterns reported by developers on Reddit, HN, and the LangChain Discord.

Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain

Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain

The Problem: Agents Control the Loop Count

Real Cost Numbers

Step 1: Set Hard Iteration Limits

LangChain

CrewAI

Step 2: Track Cost Per Agent Run

The trace_id Pattern

What to Track

Step 3: Use Cheaper Models for Agent Reasoning

Model Routing in Agents

CrewAI Multi-Model Setup

Step 4: Monitor and Alert

Budget Alert Example

The Cost Optimization Checklist for Agents

The Bottom Line

Want to track your AI spend automatically?