guideMar 22, 20267 min read

Batch API Saves 50% — Here's How to Know If Your Workload Qualifies

OpenAI's Batch API offers a flat 50% discount. But not every workload qualifies. Here's how to audit your API calls and find the easy wins.

Batch API Saves 50% — Here's How to Know If Your Workload Qualifies

OpenAI's Batch API charges 50% less than the standard API. Same models, same quality, same token limits. The only trade-off: results are delivered within 24 hours instead of in real-time (though in practice, most complete within minutes).

If you're spending $200/month or more on OpenAI, there's a good chance you're leaving $100/month on the table.

How the Batch API Works

Instead of sending individual requests and waiting for immediate responses, you submit a batch of requests as a JSONL file. OpenAI processes them asynchronously and returns all results when done.

Pricing Comparison

Model	Standard (per 1M input)	Batch (per 1M input)	Savings
GPT-4o	$2.50	$1.25	50%
GPT-4o-mini	$0.15	$0.075	50%
GPT-4.5 Preview	$75.00	$37.50	50%

Output tokens get the same 50% discount. The savings are flat across every model.

The Trade-Off

	Standard API	Batch API
Response time	1-30 seconds	Up to 24 hours (usually minutes)
Pricing	Full price	50% off
Rate limits	Standard	Higher (separate pool)
SLA	Real-time	Best-effort within 24h

Does Your Workload Qualify?

The qualifying question is simple: Does the user need to see the result immediately?

Qualifies for Batch API (user is NOT waiting)

Workload	Why It Qualifies
Nightly report generation	Reports run on a schedule, not on-demand
Content generation pipelines	Blog posts, descriptions, summaries queued for review
Data enrichment/extraction	Processing CSV rows, enriching database records
Classification of existing data	Labeling historical records, sentiment analysis on past reviews
Email draft preparation	Drafts generated in advance, user reviews later
Test data generation	Creating synthetic test data for QA
Embedding generation	Batch-embedding documents for RAG pipelines
Translation of static content	Translating documentation, help articles, product descriptions

Does NOT Qualify (user IS waiting)

Workload	Why It Doesn't Qualify
Chatbot responses	User is typing and waiting for a reply
Real-time search/RAG	User submitted a query and expects results now
Live content suggestions	User is writing and expects inline suggestions
Interactive coding assistants	Developer expects immediate code completion
Real-time moderation	Content needs to be checked before it's shown

The Gray Zone (might qualify)

Workload	When It Qualifies
Email drafts	If user clicks "generate" and comes back later
Document summarization	If it's batch processing (queue of docs), not single on-demand
Report generation	If triggered by cron, not by "Generate Report" button
Analytics summaries	If daily/weekly digest, not real-time dashboard

How to Audit Your Workloads

Step 1: Tag every API call by feature

If you're already tracking costs with tags (you should be), pull your breakdown:

Feature: chatbot         → 8,200 calls/mo → $310/mo → REAL-TIME (skip)
Feature: report-gen      → 3,100 calls/mo → $145/mo → BATCHABLE
Feature: email-drafts    → 1,200 calls/mo → $45/mo  → BATCHABLE
Feature: data-enrichment → 2,400 calls/mo → $92/mo  → BATCHABLE

In this example, $282/month is batchable → $141/month savings.

Step 2: Check if the user flow tolerates async

For each batchable workload, verify:

The user doesn't see a loading spinner waiting for the result
The feature works with a "processing... we'll notify you when ready" UX
Results can be stored and retrieved later

Step 3: Estimate savings

Batchable monthly spend × 50% = savings

$282 × 50% = $141/month saved
$1,692/year saved

Implementation: Moving to Batch API

OpenAI Batch API Request

import json
from openai import OpenAI

client = OpenAI()

# 1. Create a JSONL file with your requests
requests = []
for item in items_to_process:
    requests.append({
        "custom_id": item.id,
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "Extract key information..."},
                {"role": "user", "content": item.text},
            ],
        },
    })

# Write to JSONL
with open("batch_input.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# 2. Upload the file
batch_file = client.files.create(
    file=open("batch_input.jsonl", "rb"),
    purpose="batch",
)

# 3. Create the batch
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

# 4. Check status (or poll)
status = client.batches.retrieve(batch.id)
print(f"Status: {status.status}")  # validating → in_progress → completed

When to Process Batches

Pattern	Schedule	Use Case
Nightly batch	Cron at 2 AM	Reports, enrichment, classification
Hourly micro-batch	Every hour	Content pipelines, queue processing
Queue-based	When queue hits N items	Email drafts, data processing

Common Mistakes

1. Not batching because "it's only a few calls"

Even 1,000 calls/month at $0.03 each = $30/month standard, $15/month batch. Over a year, that's $180 saved for 30 minutes of migration work.

2. Sending tiny batches

There's overhead per batch (file upload, status polling). Batch at least 50-100 requests together for efficiency.

3. Not handling failures

Batch API can partially fail. Always check per-request status in the response and retry failed items:

# Check results
results = client.files.content(batch.output_file_id)
for line in results.text.strip().split("\n"):
    result = json.loads(line)
    if result.get("error"):
        # Retry this item or flag for manual review
        handle_error(result["custom_id"], result["error"])

4. Forgetting to track batch costs separately

Tag batch API calls differently so you can measure the actual savings:

tags = {
    "feature": "report-generation",
    "api_type": "batch",  # vs "standard"
    "batch_id": batch.id,
}

The Decision Framework

Is the user waiting for this result right now?
├── YES → Standard API (full price)
└── NO → Can it wait 1+ hours?
    ├── YES → Batch API (50% off)
    └── NO → Standard API, but consider:
        └── Can you pre-generate and cache?
            ├── YES → Batch generate, serve from cache
            └── NO → Standard API

Summary

Question	Answer
What discount?	50% off all models (input and output tokens)
What's the catch?	Results within 24h (usually faster)
Minimum batch size?	1 request (but batch 50+ for efficiency)
Which models?	All OpenAI models
How hard to implement?	1-2 hours for a typical workload
Who should do this?	Anyone spending $100+/mo on non-real-time API calls

The Batch API is the single easiest cost optimization available. If you have batchable workloads and aren't using it, you're paying double for no reason.

We built AISpendGuard to help you find batchable workloads automatically. Tag your API calls by feature, and our waste detection engine identifies which workloads qualify for batch processing — with estimated monthly savings.

Free tier: 50,000 events/month. No credit card required.

Start tracking at aispendguard.com

All pricing as of March 2026. Batch API is currently available from OpenAI. Anthropic offers a similar Message Batches API with 50% off. Check provider docs for the latest availability.

Batch API Saves 50% — Here's How to Know If Your Workload Qualifies

Batch API Saves 50% — Here's How to Know If Your Workload Qualifies

How the Batch API Works

Pricing Comparison

The Trade-Off

Does Your Workload Qualify?

Qualifies for Batch API (user is NOT waiting)

Does NOT Qualify (user IS waiting)

The Gray Zone (might qualify)

How to Audit Your Workloads

Step 1: Tag every API call by feature

Step 2: Check if the user flow tolerates async

Step 3: Estimate savings

Implementation: Moving to Batch API

OpenAI Batch API Request

When to Process Batches

Common Mistakes

1. Not batching because "it's only a few calls"

2. Sending tiny batches

3. Not handling failures

4. Forgetting to track batch costs separately

The Decision Framework

Summary

Want to track your AI spend automatically?