Batch API Saves 50% — Here's How to Know If Your Workload Qualifies
OpenAI's Batch API charges 50% less than the standard API. Same models, same quality, same token limits. The only trade-off: results are delivered within 24 hours instead of in real-time (though in practice, most complete within minutes).
If you're spending $200/month or more on OpenAI, there's a good chance you're leaving $100/month on the table.
How the Batch API Works
Instead of sending individual requests and waiting for immediate responses, you submit a batch of requests as a JSONL file. OpenAI processes them asynchronously and returns all results when done.
Pricing Comparison
| Model | Standard (per 1M input) | Batch (per 1M input) | Savings |
|---|---|---|---|
| GPT-4o | $2.50 | $1.25 | 50% |
| GPT-4o-mini | $0.15 | $0.075 | 50% |
| GPT-4.5 Preview | $75.00 | $37.50 | 50% |
Output tokens get the same 50% discount. The savings are flat across every model.
The Trade-Off
| Standard API | Batch API | |
|---|---|---|
| Response time | 1-30 seconds | Up to 24 hours (usually minutes) |
| Pricing | Full price | 50% off |
| Rate limits | Standard | Higher (separate pool) |
| SLA | Real-time | Best-effort within 24h |
Does Your Workload Qualify?
The qualifying question is simple: Does the user need to see the result immediately?
Qualifies for Batch API (user is NOT waiting)
| Workload | Why It Qualifies |
|---|---|
| Nightly report generation | Reports run on a schedule, not on-demand |
| Content generation pipelines | Blog posts, descriptions, summaries queued for review |
| Data enrichment/extraction | Processing CSV rows, enriching database records |
| Classification of existing data | Labeling historical records, sentiment analysis on past reviews |
| Email draft preparation | Drafts generated in advance, user reviews later |
| Test data generation | Creating synthetic test data for QA |
| Embedding generation | Batch-embedding documents for RAG pipelines |
| Translation of static content | Translating documentation, help articles, product descriptions |
Does NOT Qualify (user IS waiting)
| Workload | Why It Doesn't Qualify |
|---|---|
| Chatbot responses | User is typing and waiting for a reply |
| Real-time search/RAG | User submitted a query and expects results now |
| Live content suggestions | User is writing and expects inline suggestions |
| Interactive coding assistants | Developer expects immediate code completion |
| Real-time moderation | Content needs to be checked before it's shown |
The Gray Zone (might qualify)
| Workload | When It Qualifies |
|---|---|
| Email drafts | If user clicks "generate" and comes back later |
| Document summarization | If it's batch processing (queue of docs), not single on-demand |
| Report generation | If triggered by cron, not by "Generate Report" button |
| Analytics summaries | If daily/weekly digest, not real-time dashboard |
How to Audit Your Workloads
Step 1: Tag every API call by feature
If you're already tracking costs with tags (you should be), pull your breakdown:
Feature: chatbot → 8,200 calls/mo → $310/mo → REAL-TIME (skip)
Feature: report-gen → 3,100 calls/mo → $145/mo → BATCHABLE
Feature: email-drafts → 1,200 calls/mo → $45/mo → BATCHABLE
Feature: data-enrichment → 2,400 calls/mo → $92/mo → BATCHABLE
In this example, $282/month is batchable → $141/month savings.
Step 2: Check if the user flow tolerates async
For each batchable workload, verify:
- The user doesn't see a loading spinner waiting for the result
- The feature works with a "processing... we'll notify you when ready" UX
- Results can be stored and retrieved later
Step 3: Estimate savings
Batchable monthly spend × 50% = savings
$282 × 50% = $141/month saved
$1,692/year saved
Implementation: Moving to Batch API
OpenAI Batch API Request
import json
from openai import OpenAI
client = OpenAI()
# 1. Create a JSONL file with your requests
requests = []
for item in items_to_process:
requests.append({
"custom_id": item.id,
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "Extract key information..."},
{"role": "user", "content": item.text},
],
},
})
# Write to JSONL
with open("batch_input.jsonl", "w") as f:
for req in requests:
f.write(json.dumps(req) + "\n")
# 2. Upload the file
batch_file = client.files.create(
file=open("batch_input.jsonl", "rb"),
purpose="batch",
)
# 3. Create the batch
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
# 4. Check status (or poll)
status = client.batches.retrieve(batch.id)
print(f"Status: {status.status}") # validating → in_progress → completed
When to Process Batches
| Pattern | Schedule | Use Case |
|---|---|---|
| Nightly batch | Cron at 2 AM | Reports, enrichment, classification |
| Hourly micro-batch | Every hour | Content pipelines, queue processing |
| Queue-based | When queue hits N items | Email drafts, data processing |
Common Mistakes
1. Not batching because "it's only a few calls"
Even 1,000 calls/month at $0.03 each = $30/month standard, $15/month batch. Over a year, that's $180 saved for 30 minutes of migration work.
2. Sending tiny batches
There's overhead per batch (file upload, status polling). Batch at least 50-100 requests together for efficiency.
3. Not handling failures
Batch API can partially fail. Always check per-request status in the response and retry failed items:
# Check results
results = client.files.content(batch.output_file_id)
for line in results.text.strip().split("\n"):
result = json.loads(line)
if result.get("error"):
# Retry this item or flag for manual review
handle_error(result["custom_id"], result["error"])
4. Forgetting to track batch costs separately
Tag batch API calls differently so you can measure the actual savings:
tags = {
"feature": "report-generation",
"api_type": "batch", # vs "standard"
"batch_id": batch.id,
}
The Decision Framework
Is the user waiting for this result right now?
├── YES → Standard API (full price)
└── NO → Can it wait 1+ hours?
├── YES → Batch API (50% off)
└── NO → Standard API, but consider:
└── Can you pre-generate and cache?
├── YES → Batch generate, serve from cache
└── NO → Standard API
Summary
| Question | Answer |
|---|---|
| What discount? | 50% off all models (input and output tokens) |
| What's the catch? | Results within 24h (usually faster) |
| Minimum batch size? | 1 request (but batch 50+ for efficiency) |
| Which models? | All OpenAI models |
| How hard to implement? | 1-2 hours for a typical workload |
| Who should do this? | Anyone spending $100+/mo on non-real-time API calls |
The Batch API is the single easiest cost optimization available. If you have batchable workloads and aren't using it, you're paying double for no reason.
We built AISpendGuard to help you find batchable workloads automatically. Tag your API calls by feature, and our waste detection engine identifies which workloads qualify for batch processing — with estimated monthly savings.
Free tier: 50,000 events/month. No credit card required.
Start tracking at aispendguard.com
All pricing as of March 2026. Batch API is currently available from OpenAI. Anthropic offers a similar Message Batches API with 50% off. Check provider docs for the latest availability.