Batch Mode

Run hundreds of questions against the same context. Cache is always enabled — the first question pays full cost, subsequent questions benefit from provider-level caching at up to 90% savings.

End-to-end example

Audit a TypeScript source tree for security smells in three commands:

End-to-end

# 1. Write the questions
cat > audit.txt << 'EOF'
Are any credentials hardcoded?
Where is user input validated?
Are SQL queries parameterized?
EOF

# 2. Warm the cache (optional — first batch question pays full cost otherwise)
rlmx cache --context ./src/ --ext .ts,.js

# 3. Run the batch against the warm cache
rlmx batch audit.txt --context ./src/ --ext .ts,.js --max-cost 1.00

stdout

{"question":"Are any credentials hardcoded?","answer":"No hardcoded credentials detected in src/. All secrets are pulled from process.env via src/config/env.ts.","stats":{"iterations":2,"inputTokens":1500,"outputTokens":220,"cost":0.0070}}
{"question":"Where is user input validated?","answer":"Input validation lives in src/middleware/validate.ts using zod schemas per route...","stats":{"iterations":2,"inputTokens":1400,"outputTokens":310,"cost":0.0008}}
{"question":"Are SQL queries parameterized?","answer":"All queries in src/db/ use parameterized statements via the pg client; no string interpolation found.","stats":{"iterations":1,"inputTokens":1200,"outputTokens":180,"cost":0.0006}}
{"type":"aggregate","total_questions":3,"completed":3,"total_cost":0.0084,"cache_savings":0.0128}

The first question paid ~

0.0070 (cache miss); the next two paid ~

0.0007 each thanks to Gemini’s 90% cache discount.

Quick start

Create a questions file (one question per line):

questions.txt

What authentication methods are supported?
How does the rate limiter work?
What database migrations exist?
# This is a comment — skipped
Where are the API routes defined?

Run it:

rlmx batch questions.txt --context ./src/

Output is JSONL — one JSON object per question, plus a final aggregate line:

{"question":"What authentication methods are supported?","answer":"JWT and OAuth2...","stats":{"iterations":2,"inputTokens":1500,"outputTokens":800,"cost":0.0045}}
{"question":"How does the rate limiter work?","answer":"Token bucket algorithm...","stats":{"iterations":3,"inputTokens":1200,"outputTokens":600,"cost":0.0008}}
{"question":"What database migrations exist?","answer":"12 migrations in src/db/...","stats":{"iterations":2,"inputTokens":1100,"outputTokens":500,"cost":0.0007}}
{"question":"Where are the API routes defined?","answer":"src/routes/ directory...","stats":{"iterations":1,"inputTokens":900,"outputTokens":400,"cost":0.0005}}
{"type":"aggregate","total_questions":4,"completed":4,"total_cost":0.0065,"cache_savings":0.0152}

Questions file format

One question per line
Empty lines are skipped
Lines starting with # are treated as comments

What is the project structure?
How does error handling work?

# Security section
What input validation exists?
Are there any SQL injection risks?

Cache behavior

Batch mode always enables caching. Here’s the cost flow:

Question	Cache status	Cost
First	Cache miss (cold)	Full input token cost
Second+	Cache hit (warm)	50-90% cheaper (only cache-read tokens billed)

The exact savings depend on your provider:

Provider	Cache discount
Google Gemini	~90% on cached input tokens
Anthropic	~90% on cached input tokens
OpenAI	~50% on cached input tokens

Pre-warming the cache

Warm the cache before running batch queries to ensure the first question also gets cache pricing:

# Warm the cache
rlmx cache --context ./docs/

# Now run batch — all questions hit warm cache
rlmx batch questions.txt --context ./docs/

Estimating costs

Check how much a batch run will cost before committing:

rlmx cache --context ./docs/ --estimate

Context: ./docs/ (23 files, 145KB)
Estimated tokens: 43,500
Provider limit: 1,000,000 (google)
Cache retention: long
Estimated first-query cost: $0.003
Estimated cached-query cost: $0.0003 (90% savings)

For a 100-question batch over this context: ~

0.003 (first) + 99 *

0.0003 = ~$0.033 total.

Budget enforcement

Set a maximum spend to prevent runaway costs:

rlmx batch questions.txt --context ./src/ --max-cost 1.00

Cumulative cost is tracked across all questions. When the budget is exceeded, RLMX stops gracefully and reports how many questions were completed.

Batch options

Flag	Default	Description
`--context <path>`	—	Context directory or file
`--max-iterations <n>`	`30`	Max RLM iterations per question
`--max-cost <n>`	—	Max total USD spend
`--parallel <n>`	`1`	Concurrent questions
`--batch-api`	`false`	Use Gemini Batch API for 50% cost reduction
`--output <mode>`	—	Output mode
`--verbose`	`false`	Show progress

Gemini Batch API

For Google Gemini models, the --batch-api flag enables the Gemini Batch API, which provides an additional 50% cost reduction on top of caching:

rlmx batch questions.txt --context ./docs/ --batch-api

Cost stacking

Mode	Input cost (per 1M tokens)	Savings
Base (flash-lite)	$0.075	—
+ Context caching	~$0.0075	90%
+ Batch API	~$0.0375	50%
Cache + Batch	~$0.00375	95%

100 queries over 500K tokens of context: under $2.00 with both cache and batch stacking.

Batch API jobs are asynchronous. Results may take longer to return compared to standard API calls, but the cost savings are significant for large runs.

Practical patterns

Study session over documentation

# Prepare questions
cat > study.txt << 'EOF'
What are the core abstractions?
How does the plugin system work?
What are the extension points?
How is state managed?
What patterns does the codebase use?
EOF

# Warm cache, then batch
rlmx cache --context ./docs/
rlmx batch study.txt --context ./docs/ --max-iterations 5

Code audit

cat > audit.txt << 'EOF'
Are there any hardcoded credentials?
What input validation exists?
How are SQL queries constructed?
Are there any command injection risks?
How is authentication implemented?
What error information is leaked to users?
EOF

rlmx batch audit.txt --context ./src/ --ext .ts,.js --max-cost 2.00

Codebase onboarding

cat > onboard.txt << 'EOF'
What is the project structure and architecture?
What are the main entry points?
How is the database accessed?
What external services are called?
How are tests organized?
What CI/CD pipeline is used?
EOF

rlmx batch onboard.txt --context . --ext .ts,.js,.json,.yaml --tools standard

CAG vs RLM for batch

Approach	When to use
Batch + cache (CAG)	Context fits in provider window, many questions, cost matters
Batch + RLM (no cache)	Context too large for system prompt, complex navigation needed

By default, batch mode uses CAG (cache enabled). For very large contexts that exceed provider limits, RLMX falls back to standard RLM iteration automatically.

Provider context limits

Provider	Max context (cached)
Google Gemini	1,000,000 tokens
Anthropic	200,000 tokens
OpenAI	128,000 tokens
Amazon Bedrock	128,000 tokens

If your context exceeds the provider limit, RLMX will warn you and fall back to RLM mode where the LLM navigates the context programmatically.

​Batch Mode

​End-to-end example

​Quick start

​Questions file format

​Cache behavior

​Pre-warming the cache

​Estimating costs

​Budget enforcement

​Batch options

​Gemini Batch API

​Cost stacking

​Practical patterns

​Study session over documentation

​Code audit

​Codebase onboarding

​CAG vs RLM for batch

​Provider context limits

Batch Mode

End-to-end example

Quick start

Questions file format

Cache behavior

Pre-warming the cache

Estimating costs

Budget enforcement

Batch options

Gemini Batch API

Cost stacking

Practical patterns

Study session over documentation

Code audit

Codebase onboarding

CAG vs RLM for batch

Provider context limits