guideMar 22, 20266 min read

Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain

One user request can trigger 15+ LLM calls. Here's how to see what each agent run actually costs — and how to set limits before the bill arrives.


Agent Loops Are Expensive: Tracking Per-Run Costs in LangChain

Here's a number that should scare you: $12,000.

That's what one developer reported paying for a single runaway LangChain recursive chain. No monitoring, no iteration limits, no cost visibility. The agent just kept calling the LLM until the API key hit its spending limit.

Agent frameworks like LangChain, CrewAI, and AutoGen are powerful. They're also unpredictable — because the agent decides how many LLM calls to make, not you.


The Problem: Agents Control the Loop Count

In a traditional API integration, you control the cost:

1 user action → 1 API call → predictable cost

With agents, the user action triggers a thinking loop:

1 user action → agent plans → calls LLM → evaluates → calls tools
→ calls LLM again → evaluates → calls more tools → calls LLM again
→ ... → final answer

Each iteration is a separate LLM call. A simple question might take 3 iterations. A complex one might take 15. A poorly designed prompt might loop indefinitely.

Real Cost Numbers

ScenarioIterationsModelCost per Run
Simple ReAct agent3-5GPT-4o$0.15-0.25
Research agent with tools8-12GPT-4o$0.40-0.60
Multi-agent crew (CrewAI)15-30GPT-4o$0.75-1.50
Runaway recursive chain50+GPT-4o$2.50+

At 1,000 tasks per day with a multi-agent crew: $750-1,500/month.

The cost is fundamentally unpredictable because the agent decides the loop count based on the input. Two similar-looking requests can cost 5x different amounts.


Step 1: Set Hard Iteration Limits

Every framework supports this. There's no reason to skip it.

LangChain

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,       # Hard stop at 10 iterations
    max_execution_time=30,   # Hard stop at 30 seconds
    early_stopping_method="generate",  # Generate a final answer, don't just stop
)

CrewAI

from crewai import Agent

researcher = Agent(
    role="Research Analyst",
    goal="Find relevant data",
    max_iter=8,              # Max 8 iterations per task
    max_rpm=10,              # Rate limit: 10 requests per minute
)

Default limits are often too high or nonexistent. LangChain's default max_iterations is 15. For most use cases, 5-8 is sufficient. Set it explicitly.


Step 2: Track Cost Per Agent Run

Iteration limits prevent runaway loops, but they don't tell you what each run costs. For that, you need per-run cost tracking.

The trace_id Pattern

Assign a unique trace_id to each agent invocation. Tag every LLM call within that run with the same trace_id. After the run, sum the costs.

import uuid
from aispendguard import track_usage

# Generate a unique trace ID for this agent run
trace_id = str(uuid.uuid4())

# In your LangChain callback:
class CostTrackingCallback(BaseCallbackHandler):
    def on_llm_end(self, response, **kwargs):
        usage = response.llm_output.get("token_usage", {})
        track_usage(
            model=response.llm_output.get("model_name", "gpt-4o"),
            tokens_in=usage.get("prompt_tokens", 0),
            tokens_out=usage.get("completion_tokens", 0),
            tags={
                "trace_id": trace_id,
                "feature": "research-agent",
                "iteration": str(self.iteration_count),
            },
        )
        self.iteration_count += 1

Now you can answer: "What did this specific agent run cost?" and "What's my average cost per run?"

What to Track

TagWhy
trace_idGroups all LLM calls in one agent run
featureWhich agent/workflow triggered this
iterationWhich step in the loop (identifies expensive steps)
modelWhich model was used (agents may use different models per step)
user_idCost attribution per user

Step 3: Use Cheaper Models for Agent Reasoning

Not every agent iteration needs GPT-4o. The planning and evaluation steps often work fine with GPT-4o-mini.

Model Routing in Agents

# Planning step: cheap model is fine
planner_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Execution step with complex reasoning: expensive model
executor_llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Final synthesis: cheap model is fine
synthesizer_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

In a 10-iteration agent run, if 7 iterations use GPT-4o-mini and 3 use GPT-4o, you cut the cost by roughly 60% compared to using GPT-4o for everything.

CrewAI Multi-Model Setup

from crewai import Agent

# Research tasks: need the powerful model
researcher = Agent(
    role="Researcher",
    llm="gpt-4o",
    max_iter=5,
)

# Writing tasks: cheaper model works
writer = Agent(
    role="Writer",
    llm="gpt-4o-mini",
    max_iter=3,
)

Step 4: Monitor and Alert

Once you're tracking per-run costs, set alerts for anomalies:

  • Per-run threshold: Alert if any single agent run exceeds $1.00
  • Daily budget: Alert if total agent spend exceeds $50/day
  • Iteration anomaly: Alert if an agent run exceeds 10 iterations (suggests a loop problem)

Budget Alert Example

# After each agent run, check the cost
run_cost = sum_costs_by_trace_id(trace_id)

if run_cost > 1.00:
    notify_slack(f"Agent run {trace_id} cost ${run_cost:.2f} — investigate")

if run_cost > 5.00:
    disable_agent()  # Kill switch for runaway costs
    notify_pagerduty(f"Agent cost emergency: ${run_cost:.2f}")

The Cost Optimization Checklist for Agents

ActionEffortSavings
Set max_iterations explicitly5 minutesPrevents $1,000+ runaway loops
Track cost per run with trace_id30 minutesEnables all other optimizations
Route cheap steps to GPT-4o-mini1 hour40-60% per run
Set budget alerts15 minutesPrevents surprise bills
Cache repeated tool outputs1-2 hours20-40% on tool-heavy agents
Limit context window per iteration30 minutes20-30% on history-heavy agents

The Bottom Line

Agent frameworks are the fastest-growing source of AI API costs. The combination of unpredictable iteration counts, expensive models, and no built-in cost visibility creates the perfect conditions for bill shock.

The fix isn't complicated:

  1. Set hard limits on iterations
  2. Track cost per run
  3. Use cheaper models where quality allows
  4. Alert on anomalies before they become invoices

We built AISpendGuard with agent cost tracking as a core use case. The LangChain and CrewAI integrations automatically track per-run costs with trace_id grouping — no custom callbacks needed.

Free tier: 50,000 events/month. No credit card required.

Start tracking agent costs at aispendguard.com


Pricing reflects OpenAI and Anthropic rates as of March 2026. Agent iteration counts are based on real-world usage patterns reported by developers on Reddit, HN, and the LangChain Discord.


Want to track your AI spend automatically?

AISpendGuard detects waste patterns, breaks down costs by feature, and recommends specific changes with $/mo savings estimates.