CLI Reference

`rlmx "query"`

Run an RLM query against a context.

rlmx "query" [options]

The default command. Loads context into a Python REPL, then iterates with the LLM until it produces a final answer.

Options

Flag	Type	Default	Description
`--context <path>`	string	—	Path to context directory or file
`--output <mode>`	string	`text`	Output mode: `text`, `json`, or `stream`
`--verbose`	boolean	`false`	Show iteration progress on stderr
`--max-iterations <n>`	number	`30`	Maximum RLM iterations before forced termination
`--timeout <ms>`	number	`300000`	Timeout in milliseconds (5 minutes)
`--stats`	boolean	`false`	Emit JSON stats to stderr (or include in `--output json`)
`--log <path>`	string	—	Write structured JSONL log to file
`--tools <level>`	string	`core`	Tool level: `core`, `standard`, or `full`
`--max-cost <n>`	number	—	Maximum USD spend per run
`--max-tokens <n>`	number	—	Maximum total tokens per run
`--max-depth <n>`	number	—	Maximum recursive `rlm_query` depth
`--ext <list>`	string	`.md`	File extensions for context dirs (comma-separated)
`--thinking <level>`	string	—	Thinking level: `minimal`, `low`, `medium`, `high` (Gemini only)
`--cache`	boolean	`false`	Enable CAG mode (full context cached in system prompt)

Context loading

Input	Behavior
`--context dir/`	Recursively reads files matching `--ext` as `list[{path, content}]`
`--context file.md`	Reads as single string
`--context file.json`	Parses JSON as dict or list
stdin pipe	Reads as single string

Examples

# Basic query with directory context
rlmx "How does IPC work?" --context ./docs/

# JSON output with stats
rlmx "Summarize this" --context paper.md --output json --stats

# Code analysis with extended file types
rlmx "Analyze code" --context ./src/ --tools full --ext .ts,.js

# Budget-limited query
rlmx "Quick question" --max-cost 0.10 --max-tokens 5000

# Piped input with logging
echo "data" | rlmx "Analyze this" --log run.jsonl

# CAG mode for repeated queries
rlmx "First question" --context ./docs/ --cache
rlmx "Follow-up question" --context ./docs/ --cache

# Gemini thinking mode
rlmx "Complex analysis" --context ./src/ --thinking high

Output formats

{
  "answer": "The answer to your query...",
  "references": ["docs/start/create-project.md", "docs/concept/ipc.md"],
  "usage": {
    "inputTokens": 12500,
    "outputTokens": 3200,
    "llmCalls": 5
  },
  "iterations": 3,
  "model": "google/gemini-3.1-flash-lite-preview"
}

`rlmx init`

Scaffold a .rlmx/ config directory with templates.

rlmx init [--template <type>] [--dir <path>]

Flag	Type	Default	Description
`--template <type>`	string	`default`	Template type: `default` or `code`
`--dir <path>`	string	`.` (cwd)	Directory to scaffold in

Creates a .rlmx/ directory containing:

File	Purpose
`rlmx.yaml`	Main configuration (model, budget, context, storage)
`SYSTEM.md`	System prompt used by the RLM loop
`CRITERIA.md`	Output criteria for quality checks
`TOOLS.md`	Custom Python tools exposed to the RLM

RLMX v0.260331+ uses the .rlmx/ directory as the only config location. Projects using the old flat-file rlmx.yaml in the project root must re-run rlmx init to migrate.

Templates

default — General-purpose RLM usage with balanced system prompt and criteria
code — Code analysis template with code-focused system prompt and tool definitions

Example

# Scaffold with default template
rlmx init

# Scaffold with code analysis template
rlmx init --template code

# Scaffold in a specific directory
rlmx init --template default --dir ./my-project

`rlmx cache`

Pre-warm the provider cache for a given context, or estimate its size and cost without making any API calls.

rlmx cache --context <path> [--estimate] [options]

--context is required. Without --estimate, rlmx performs a single-iteration warmup run so the provider caches the prompt prefix for subsequent queries.

Flag	Type	Default	Description
`--context <path>`	string	required	Path to context directory or file
`--estimate`	boolean	`false`	Print token/cost estimate only — skip the warmup call
`--ext <list>`	string	from `rlmx.yaml`	File extensions when `--context` is a directory (comma-separated)
`--tools <level>`	string	`core`	Tool level: `core`, `standard`, or `full`
`--timeout <ms>`	number	`300000`	Warmup timeout
`--verbose`	boolean	`false`	Verbose stderr logging

Examples

# Estimate only — no LLM calls
rlmx cache --context ./docs/ --estimate

# Warm the provider cache for future queries
rlmx cache --context ./docs/

# Custom extensions for a source-code corpus
rlmx cache --context ./src/ --ext .ts,.js --estimate

Outputs

With --estimate — a key: value block is printed to stdout and the process exits without calling any LLM:

rlmx cache estimate
---
  context:          ./docs/
  metadata:         23 files, 145KB
  estimated tokens: 43,500
  provider limit:   1,048,576 tokens
  utilization:      4.1%
  provider:         google
  model:            gemini-3.1-flash-lite-preview
  ttl:              3600s
  estimated cost:   $0.0033

Without --estimate — a minimal RLM loop runs to prime the provider cache. Progress and the summary are emitted to stderr:

rlmx: warming cache for ./docs/ (~43,500 tokens)
rlmx: cache warmup complete
  provider:         google
  model:            gemini-3.1-flash-lite-preview
  estimated tokens: 43,500
  ttl:              3600s
  estimated cost:   $0.0033

If the context exceeds the provider’s token limit, the command exits with a non-zero status and an error message. See Cache Mode for the full caching workflow.

`rlmx batch`

Run bulk queries from a questions file against a shared cached context. Each question is executed through the same RLM loop used by rlmx "query", with provider-level prompt caching always enabled so the first question pays full price and subsequent questions benefit from the cache.

rlmx batch <questions-file> [options]

Questions are read one per line. Blank lines and lines beginning with # are ignored.

Flag	Type	Default	Description
`<questions-file>`	path	required	Path to a text file of questions (one per line)
`--context <path>`	string	—	Shared context for every question
`--max-iterations <n>`	number	`30`	Maximum RLM iterations per question
`--timeout <ms>`	number	`300000`	Per-question timeout
`--max-cost <n>`	number	—	Stop after cumulative USD cost crosses this threshold
`--max-tokens <n>`	number	—	Per-question token cap
`--max-depth <n>`	number	—	Maximum recursive `rlm_query` depth
`--parallel <n>`	number	`1`	Concurrency hint (currently executes sequentially)
`--batch-api`	boolean	`false`	Opt into the Gemini Batch API path (requires `provider: google`)
`--tools <level>`	string	`core`	Tool level: `core`, `standard`, or `full`
`--ext <list>`	string	from `rlmx.yaml`	File extensions for directory context
`--verbose`	boolean	`false`	Show per-question progress on stderr

--cache does not need to be passed — batch mode always runs with cache.enabled = true. If the context exceeds the provider’s token limit, rlmx falls back to pgserve storage mode when storage.enabled is auto or always.

Example

# Run a questions file against a cached docs corpus
rlmx batch questions.txt --context ./docs/

# Stop if total spend crosses $1.00
rlmx batch questions.txt --context ./src/ --max-cost 1.00

# Gemini Batch API for 50% input/output token discount
rlmx batch questions.txt --context ./docs/ --batch-api

Outputs

rlmx batch writes JSONL to stdout — one JSON object per question, followed by a final aggregate line:

{"question":"How does IPC work?","answer":"IPC uses...","stats":{"iterations":2,"inputTokens":42100,"outputTokens":520,"cost":0.0042}}
{"question":"Where is auth defined?","answer":"src/auth.ts...","stats":{"iterations":1,"inputTokens":820,"outputTokens":310,"cost":0.0009}}
{"type":"aggregate","total_questions":2,"completed":2,"total_cost":0.0051,"cache_savings":0.004}

Budget trips, cache fallbacks, and verbose progress are logged to stderr so the stdout stream stays valid JSONL for downstream pipelines. See Batch Mode for full details.

`rlmx stats`

Query run history and cost breakdowns from the rlmx observability database (pgserve at ~/.rlmx/data). Stats are populated automatically by every run that saves a session.

rlmx stats [options]

Flag	Type	Default	Description
`--run <id>`	string	—	Show the event timeline for a specific session id
`--costs`	boolean	`false`	Show cost breakdown grouped by model
`--tools`	boolean	`false`	Show REPL tool usage grouped by session
`--since <duration>`	string	—	Limit to the recent window (`30m`, `24h`, `7d`)
`--output json`	literal	—	Emit structured JSON instead of the terminal table

Without any flags, rlmx stats prints the 20 most recent sessions as a terminal table (id, query, model, iterations, cost, status, duration).

Stats require pgserve storage. If ~/.rlmx/data does not exist, the command prints "No stats yet. Run a query first." and exits cleanly. See Configuration for storage setup.

Examples

# Most recent 20 runs as a table
rlmx stats

# JSON for scripting / jq pipelines
rlmx stats --output json

# Cost by model over the last 24 hours
rlmx stats --costs --since 24h

# Tool usage over the last week
rlmx stats --tools --since 7d

# Full event timeline for a specific run
rlmx stats --run 0c3e2f1a-...-9f02

Outputs

Default (sessions table) — plain-text columns written to stdout:

ID          Query                           Model                       Iter        Cost  Status     Duration
--------------------------------------------------------------------------------------------------------------
0c3e2f1a..  How does IPC work?              google/gemini-3.1-flash...     3    $0.0042  completed       4.1s
f91d8a05..  Summarize paper.md              google/gemini-3.1-flash...     2    $0.0011  completed       1.8s

--run <id> — one row per event (llm_call, repl_exec, sub_call) with iteration, token counts, cost, duration, and kind-specific detail (model, code preview, request type). --costs — one row per (session, model) pair with total calls, input/output tokens, cost, and average call duration. --tools — one row per (session, request_type) with calls, errors, and average duration. --output json — any of the above as a pretty-printed JSON array of rows.

`rlmx benchmark`

Run benchmarks that compare the RLM loop against a direct LLM call on the same question. rlmx benchmark does not accept --context — each mode ships its own dataset.

rlmx benchmark <mode> [options]

Mode	Dataset	Measures
`cost`	Built-in curated dataset (`src/benchmark-data.json`)	Tokens, cost, latency savings for RLM vs direct
`oolong`	Oolong Synth auto-downloaded via HuggingFace	Answer quality (accuracy) plus the same cost metrics

Flags

Flag	Applies to	Default	Description
`--output json`	`cost`	table	Print JSON results to stdout instead of the table
`--samples <n>`	`oolong`	`5`	Number of samples to evaluate
`--idx <n>`	`oolong`	—	Run a specific sample index only (ignores `--samples`)
`--tools <level>`	both	`core`	Tool level used for the RLM runs

Model and provider are resolved from rlmx.yaml / ~/.rlmx/settings.json via the usual priority order.

Examples

# Cost benchmark, formatted table on stderr
rlmx benchmark cost

# Cost benchmark, machine-readable JSON on stdout
rlmx benchmark cost --output json

# Oolong quality run — 5 samples (default)
rlmx benchmark oolong

# Oolong — 20 samples
rlmx benchmark oolong --samples 20

# Oolong — just sample index 42
rlmx benchmark oolong --idx 42

Outputs

Both modes print a box-drawn comparison table to stderr with per-question rows (Direct / RLM / Savings) and a TOTALS footer covering tokens, cost, latency, and average RLM iterations:

┌───────────────┬──────────┬─────────┬──────────┬──────────┬──────┐
│ Question      │ Mode     │  Tokens │     Cost │  Latency │ Iter │
├───────────────┼──────────┼─────────┼──────────┼──────────┼──────┤
│ ipc_summary   │ Direct   │  12,400 │ $0.00093 │     820ms│   -  │
│               │ RLM      │   3,100 │ $0.00024 │   2,410ms│    3 │
│               │ Savings  │   75.0% │    74.2% │        - │      │
├───────────────┼──────────┼─────────┼──────────┼──────────┼──────┤
│ TOTALS        │ Direct   │  94,200 │ $0.00707 │   6.2s   │   -  │
│               │ RLM      │  23,500 │ $0.00182 │  18.4s   │  2.8 │
│               │ Savings  │   75.1% │    74.3% │        - │      │
└───────────────┴──────────┴─────────┴──────────┴──────────┴──────┘

With rlmx benchmark cost --output json, the same results are emitted as a structured JSON document to stdout (timestamp, mode, model, per-question runs[], and totals). Every benchmark — table or JSON — is also persisted to ~/.rlmx/benchmarks/benchmark-<mode>-<timestamp>.json and the saved path is printed to stderr.

`rlmx config`

Manage global settings stored at ~/.rlmx/settings.json.

`rlmx config set`

rlmx config set <key> <value>

Set a configuration value. Values are type-coerced: "true" becomes boolean, numeric strings become numbers.

rlmx config set GEMINI_API_KEY sk-abc123
rlmx config set model.provider google
rlmx config set model.model gemini-3.1-flash-lite-preview
rlmx config set budget.max_cost 0.50
rlmx config set gemini.thinking_level medium

`rlmx config get`

rlmx config get <key>

Retrieve a setting value. API keys are masked in output.

$ rlmx config get model.provider
google

`rlmx config list`

rlmx config list

Show all configured settings. Sensitive keys (containing API_KEY, SECRET, TOKEN) are masked.

`rlmx config delete`

rlmx config delete <key>

Remove a setting.

`rlmx config path`

rlmx config path

Print the settings file path (~/.rlmx/settings.json).

Common keys

Key	Description	Example
`GEMINI_API_KEY`	Google Gemini API key	`AIza...`
`ANTHROPIC_API_KEY`	Anthropic API key	`sk-ant-...`
`OPENAI_API_KEY`	OpenAI API key	`sk-...`
`GROQ_API_KEY`	Groq API key	`gsk_...`
`XAI_API_KEY`	xAI API key	`xai-...`
`OPENROUTER_API_KEY`	OpenRouter API key	`sk-or-...`
`model.provider`	LLM provider	`google`, `anthropic`, `openai`
`model.model`	Model ID	`gemini-3.1-flash-lite-preview`
`model.sub_call_model`	Model for `llm_query()` sub-calls	`gemini-3.1-flash-lite-preview`
`budget.max_cost`	Default max USD per run	`0.50`
`budget.max_tokens`	Default max tokens per run	`100000`
`budget.max_depth`	Default max recursion depth	`3`
`tools_level`	Default tool level	`core`, `standard`, `full`
`cache.retention`	Cache TTL strategy	`short`, `long`
`gemini.thinking_level`	Default thinking level	`minimal`, `low`, `medium`, `high`
`gemini.google_search`	Enable web search battery	`true`
`gemini.code_execution`	Enable server-side Python	`true`

Priority order

Settings are resolved in this order (highest priority first):

CLI flags (--max-cost 0.10)
Project rlmx.yaml
Global ~/.rlmx/settings.json
Hardcoded defaults

Tool levels

The --tools flag controls which functions are available in the REPL:

`core` (default)

Paper-faithful RLM functions:

Function	Description
`context`	Injected context variable
`llm_query(prompt)`	Single LLM completion
`llm_query_batched(prompts)`	Concurrent LLM calls
`rlm_query(prompt)`	Recursive child RLM session
`rlm_query_batched(prompts)`	Parallel child RLM sessions
`SHOW_VARS()`	List all REPL variables
`FINAL(answer)`	Terminate with answer string
`FINAL_VAR(name)`	Terminate with variable value

Plus any custom tools defined in rlmx.yaml.

`standard` (core + batteries)

All core functions plus utility batteries:

Function	Description
`describe_context()`	Metadata overview of loaded context
`preview_context()`	Content sample
`search_context(query)`	Semantic search over context
`grep_context(pattern)`	Regex search over context
`chunk_context()`	Split context into chunks
`chunk_text(text)`	Split arbitrary text by size
`map_query(fn, items)`	Distributed LLM calls
`reduce_query(fn, items)`	Aggregation queries

With Google provider, also includes Gemini batteries:

Function	Description
`web_search(query)`	Google web search
`fetch_url(url)`	Fetch and summarize URL content
`generate_image(prompt)`	Image generation

`full` (standard + environment)

All standard functions plus auto-injected information about available Python packages and versions in the REPL environment.

Global flags

These flags work with any command:

Flag	Description
`--help`, `-h`	Show help message
`--version`, `-v`	Show version

Getting Started

CLI Reference

Configuration

Documentation Index

​CLI Reference

​rlmx "query"

​Options

​Context loading

​Examples

​Output formats

​rlmx init

​Templates

​Example

​rlmx cache

​Examples

​Outputs

​rlmx batch

​Example

​Outputs

​rlmx stats

​Examples

​Outputs

​rlmx benchmark

​Flags

​Examples

​Outputs

​rlmx config

​rlmx config set

​rlmx config get

​rlmx config list

​rlmx config delete

​rlmx config path

​Common keys

​Priority order

​Tool levels

​core (default)

​standard (core + batteries)

​full (standard + environment)

​Global flags

CLI Reference

`rlmx "query"`

Options

Context loading

Examples

Output formats

`rlmx init`

Templates

Example

`rlmx cache`

Examples

Outputs

`rlmx batch`

Example

Outputs

`rlmx stats`

Examples

Outputs

`rlmx benchmark`

Flags

Examples

Outputs

`rlmx config`

`rlmx config set`

`rlmx config get`

`rlmx config list`

`rlmx config delete`

`rlmx config path`

Common keys

Priority order

Tool levels

`core` (default)

`standard` (core + batteries)

`full` (standard + environment)

Global flags