guideMar 22, 20267 min read

Batch API Saves 50% — Here's How to Know If Your Workload Qualifies

OpenAI's Batch API offers a flat 50% discount. But not every workload qualifies. Here's how to audit your API calls and find the easy wins.


Batch API Saves 50% — Here's How to Know If Your Workload Qualifies

OpenAI's Batch API charges 50% less than the standard API. Same models, same quality, same token limits. The only trade-off: results are delivered within 24 hours instead of in real-time (though in practice, most complete within minutes).

If you're spending $200/month or more on OpenAI, there's a good chance you're leaving $100/month on the table.


How the Batch API Works

Instead of sending individual requests and waiting for immediate responses, you submit a batch of requests as a JSONL file. OpenAI processes them asynchronously and returns all results when done.

Pricing Comparison

ModelStandard (per 1M input)Batch (per 1M input)Savings
GPT-4o$2.50$1.2550%
GPT-4o-mini$0.15$0.07550%
GPT-4.5 Preview$75.00$37.5050%

Output tokens get the same 50% discount. The savings are flat across every model.

The Trade-Off

Standard APIBatch API
Response time1-30 secondsUp to 24 hours (usually minutes)
PricingFull price50% off
Rate limitsStandardHigher (separate pool)
SLAReal-timeBest-effort within 24h

Does Your Workload Qualify?

The qualifying question is simple: Does the user need to see the result immediately?

Qualifies for Batch API (user is NOT waiting)

WorkloadWhy It Qualifies
Nightly report generationReports run on a schedule, not on-demand
Content generation pipelinesBlog posts, descriptions, summaries queued for review
Data enrichment/extractionProcessing CSV rows, enriching database records
Classification of existing dataLabeling historical records, sentiment analysis on past reviews
Email draft preparationDrafts generated in advance, user reviews later
Test data generationCreating synthetic test data for QA
Embedding generationBatch-embedding documents for RAG pipelines
Translation of static contentTranslating documentation, help articles, product descriptions

Does NOT Qualify (user IS waiting)

WorkloadWhy It Doesn't Qualify
Chatbot responsesUser is typing and waiting for a reply
Real-time search/RAGUser submitted a query and expects results now
Live content suggestionsUser is writing and expects inline suggestions
Interactive coding assistantsDeveloper expects immediate code completion
Real-time moderationContent needs to be checked before it's shown

The Gray Zone (might qualify)

WorkloadWhen It Qualifies
Email draftsIf user clicks "generate" and comes back later
Document summarizationIf it's batch processing (queue of docs), not single on-demand
Report generationIf triggered by cron, not by "Generate Report" button
Analytics summariesIf daily/weekly digest, not real-time dashboard

How to Audit Your Workloads

Step 1: Tag every API call by feature

If you're already tracking costs with tags (you should be), pull your breakdown:

Feature: chatbot         → 8,200 calls/mo → $310/mo → REAL-TIME (skip)
Feature: report-gen      → 3,100 calls/mo → $145/mo → BATCHABLE
Feature: email-drafts    → 1,200 calls/mo → $45/mo  → BATCHABLE
Feature: data-enrichment → 2,400 calls/mo → $92/mo  → BATCHABLE

In this example, $282/month is batchable → $141/month savings.

Step 2: Check if the user flow tolerates async

For each batchable workload, verify:

  1. The user doesn't see a loading spinner waiting for the result
  2. The feature works with a "processing... we'll notify you when ready" UX
  3. Results can be stored and retrieved later

Step 3: Estimate savings

Batchable monthly spend × 50% = savings

$282 × 50% = $141/month saved
$1,692/year saved

Implementation: Moving to Batch API

OpenAI Batch API Request

import json
from openai import OpenAI

client = OpenAI()

# 1. Create a JSONL file with your requests
requests = []
for item in items_to_process:
    requests.append({
        "custom_id": item.id,
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "Extract key information..."},
                {"role": "user", "content": item.text},
            ],
        },
    })

# Write to JSONL
with open("batch_input.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# 2. Upload the file
batch_file = client.files.create(
    file=open("batch_input.jsonl", "rb"),
    purpose="batch",
)

# 3. Create the batch
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

# 4. Check status (or poll)
status = client.batches.retrieve(batch.id)
print(f"Status: {status.status}")  # validating → in_progress → completed

When to Process Batches

PatternScheduleUse Case
Nightly batchCron at 2 AMReports, enrichment, classification
Hourly micro-batchEvery hourContent pipelines, queue processing
Queue-basedWhen queue hits N itemsEmail drafts, data processing

Common Mistakes

1. Not batching because "it's only a few calls"

Even 1,000 calls/month at $0.03 each = $30/month standard, $15/month batch. Over a year, that's $180 saved for 30 minutes of migration work.

2. Sending tiny batches

There's overhead per batch (file upload, status polling). Batch at least 50-100 requests together for efficiency.

3. Not handling failures

Batch API can partially fail. Always check per-request status in the response and retry failed items:

# Check results
results = client.files.content(batch.output_file_id)
for line in results.text.strip().split("\n"):
    result = json.loads(line)
    if result.get("error"):
        # Retry this item or flag for manual review
        handle_error(result["custom_id"], result["error"])

4. Forgetting to track batch costs separately

Tag batch API calls differently so you can measure the actual savings:

tags = {
    "feature": "report-generation",
    "api_type": "batch",  # vs "standard"
    "batch_id": batch.id,
}

The Decision Framework

Is the user waiting for this result right now?
├── YES → Standard API (full price)
└── NO → Can it wait 1+ hours?
    ├── YES → Batch API (50% off)
    └── NO → Standard API, but consider:
        └── Can you pre-generate and cache?
            ├── YES → Batch generate, serve from cache
            └── NO → Standard API

Summary

QuestionAnswer
What discount?50% off all models (input and output tokens)
What's the catch?Results within 24h (usually faster)
Minimum batch size?1 request (but batch 50+ for efficiency)
Which models?All OpenAI models
How hard to implement?1-2 hours for a typical workload
Who should do this?Anyone spending $100+/mo on non-real-time API calls

The Batch API is the single easiest cost optimization available. If you have batchable workloads and aren't using it, you're paying double for no reason.


We built AISpendGuard to help you find batchable workloads automatically. Tag your API calls by feature, and our waste detection engine identifies which workloads qualify for batch processing — with estimated monthly savings.

Free tier: 50,000 events/month. No credit card required.

Start tracking at aispendguard.com


All pricing as of March 2026. Batch API is currently available from OpenAI. Anthropic offers a similar Message Batches API with 50% off. Check provider docs for the latest availability.


Want to track your AI spend automatically?

AISpendGuard detects waste patterns, breaks down costs by feature, and recommends specific changes with $/mo savings estimates.