Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.automagik.dev/llms.txt

Use this file to discover all available pages before exploring further.

CLI Reference

rlmx "query"

Run an RLM query against a context.
rlmx "query" [options]
The default command. Loads context into a Python REPL, then iterates with the LLM until it produces a final answer.

Options

FlagTypeDefaultDescription
--context <path>stringPath to context directory or file
--output <mode>stringtextOutput mode: text, json, or stream
--verbosebooleanfalseShow iteration progress on stderr
--max-iterations <n>number30Maximum RLM iterations before forced termination
--timeout <ms>number300000Timeout in milliseconds (5 minutes)
--statsbooleanfalseEmit JSON stats to stderr (or include in --output json)
--log <path>stringWrite structured JSONL log to file
--tools <level>stringcoreTool level: core, standard, or full
--max-cost <n>numberMaximum USD spend per run
--max-tokens <n>numberMaximum total tokens per run
--max-depth <n>numberMaximum recursive rlm_query depth
--ext <list>string.mdFile extensions for context dirs (comma-separated)
--thinking <level>stringThinking level: minimal, low, medium, high (Gemini only)
--cachebooleanfalseEnable CAG mode (full context cached in system prompt)

Context loading

InputBehavior
--context dir/Recursively reads files matching --ext as list[{path, content}]
--context file.mdReads as single string
--context file.jsonParses JSON as dict or list
stdin pipeReads as single string

Examples

# Basic query with directory context
rlmx "How does IPC work?" --context ./docs/

# JSON output with stats
rlmx "Summarize this" --context paper.md --output json --stats

# Code analysis with extended file types
rlmx "Analyze code" --context ./src/ --tools full --ext .ts,.js

# Budget-limited query
rlmx "Quick question" --max-cost 0.10 --max-tokens 5000

# Piped input with logging
echo "data" | rlmx "Analyze this" --log run.jsonl

# CAG mode for repeated queries
rlmx "First question" --context ./docs/ --cache
rlmx "Follow-up question" --context ./docs/ --cache

# Gemini thinking mode
rlmx "Complex analysis" --context ./src/ --thinking high

Output formats

{
  "answer": "The answer to your query...",
  "references": ["docs/start/create-project.md", "docs/concept/ipc.md"],
  "usage": {
    "inputTokens": 12500,
    "outputTokens": 3200,
    "llmCalls": 5
  },
  "iterations": 3,
  "model": "google/gemini-3.1-flash-lite-preview"
}

rlmx init

Scaffold a .rlmx/ config directory with templates.
rlmx init [--template <type>] [--dir <path>]
FlagTypeDefaultDescription
--template <type>stringdefaultTemplate type: default or code
--dir <path>string. (cwd)Directory to scaffold in
Creates a .rlmx/ directory containing:
FilePurpose
rlmx.yamlMain configuration (model, budget, context, storage)
SYSTEM.mdSystem prompt used by the RLM loop
CRITERIA.mdOutput criteria for quality checks
TOOLS.mdCustom Python tools exposed to the RLM
RLMX v0.260331+ uses the .rlmx/ directory as the only config location. Projects using the old flat-file rlmx.yaml in the project root must re-run rlmx init to migrate.

Templates

  • default — General-purpose RLM usage with balanced system prompt and criteria
  • code — Code analysis template with code-focused system prompt and tool definitions

Example

# Scaffold with default template
rlmx init

# Scaffold with code analysis template
rlmx init --template code

# Scaffold in a specific directory
rlmx init --template default --dir ./my-project

rlmx cache

Pre-warm the provider cache for a given context, or estimate its size and cost without making any API calls.
rlmx cache --context <path> [--estimate] [options]
--context is required. Without --estimate, rlmx performs a single-iteration warmup run so the provider caches the prompt prefix for subsequent queries.
FlagTypeDefaultDescription
--context <path>stringrequiredPath to context directory or file
--estimatebooleanfalsePrint token/cost estimate only — skip the warmup call
--ext <list>stringfrom rlmx.yamlFile extensions when --context is a directory (comma-separated)
--tools <level>stringcoreTool level: core, standard, or full
--timeout <ms>number300000Warmup timeout
--verbosebooleanfalseVerbose stderr logging

Examples

# Estimate only — no LLM calls
rlmx cache --context ./docs/ --estimate

# Warm the provider cache for future queries
rlmx cache --context ./docs/

# Custom extensions for a source-code corpus
rlmx cache --context ./src/ --ext .ts,.js --estimate

Outputs

With --estimate — a key: value block is printed to stdout and the process exits without calling any LLM:
rlmx cache estimate
---
  context:          ./docs/
  metadata:         23 files, 145KB
  estimated tokens: 43,500
  provider limit:   1,048,576 tokens
  utilization:      4.1%
  provider:         google
  model:            gemini-3.1-flash-lite-preview
  ttl:              3600s
  estimated cost:   $0.0033
Without --estimate — a minimal RLM loop runs to prime the provider cache. Progress and the summary are emitted to stderr:
rlmx: warming cache for ./docs/ (~43,500 tokens)
rlmx: cache warmup complete
  provider:         google
  model:            gemini-3.1-flash-lite-preview
  estimated tokens: 43,500
  ttl:              3600s
  estimated cost:   $0.0033
If the context exceeds the provider’s token limit, the command exits with a non-zero status and an error message. See Cache Mode for the full caching workflow.

rlmx batch

Run bulk queries from a questions file against a shared cached context. Each question is executed through the same RLM loop used by rlmx "query", with provider-level prompt caching always enabled so the first question pays full price and subsequent questions benefit from the cache.
rlmx batch <questions-file> [options]
Questions are read one per line. Blank lines and lines beginning with # are ignored.
FlagTypeDefaultDescription
<questions-file>pathrequiredPath to a text file of questions (one per line)
--context <path>stringShared context for every question
--max-iterations <n>number30Maximum RLM iterations per question
--timeout <ms>number300000Per-question timeout
--max-cost <n>numberStop after cumulative USD cost crosses this threshold
--max-tokens <n>numberPer-question token cap
--max-depth <n>numberMaximum recursive rlm_query depth
--parallel <n>number1Concurrency hint (currently executes sequentially)
--batch-apibooleanfalseOpt into the Gemini Batch API path (requires provider: google)
--tools <level>stringcoreTool level: core, standard, or full
--ext <list>stringfrom rlmx.yamlFile extensions for directory context
--verbosebooleanfalseShow per-question progress on stderr
--cache does not need to be passed — batch mode always runs with cache.enabled = true. If the context exceeds the provider’s token limit, rlmx falls back to pgserve storage mode when storage.enabled is auto or always.

Example

# Run a questions file against a cached docs corpus
rlmx batch questions.txt --context ./docs/

# Stop if total spend crosses $1.00
rlmx batch questions.txt --context ./src/ --max-cost 1.00

# Gemini Batch API for 50% input/output token discount
rlmx batch questions.txt --context ./docs/ --batch-api

Outputs

rlmx batch writes JSONL to stdout — one JSON object per question, followed by a final aggregate line:
{"question":"How does IPC work?","answer":"IPC uses...","stats":{"iterations":2,"inputTokens":42100,"outputTokens":520,"cost":0.0042}}
{"question":"Where is auth defined?","answer":"src/auth.ts...","stats":{"iterations":1,"inputTokens":820,"outputTokens":310,"cost":0.0009}}
{"type":"aggregate","total_questions":2,"completed":2,"total_cost":0.0051,"cache_savings":0.004}
Budget trips, cache fallbacks, and verbose progress are logged to stderr so the stdout stream stays valid JSONL for downstream pipelines. See Batch Mode for full details.

rlmx stats

Query run history and cost breakdowns from the rlmx observability database (pgserve at ~/.rlmx/data). Stats are populated automatically by every run that saves a session.
rlmx stats [options]
FlagTypeDefaultDescription
--run <id>stringShow the event timeline for a specific session id
--costsbooleanfalseShow cost breakdown grouped by model
--toolsbooleanfalseShow REPL tool usage grouped by session
--since <duration>stringLimit to the recent window (30m, 24h, 7d)
--output jsonliteralEmit structured JSON instead of the terminal table
Without any flags, rlmx stats prints the 20 most recent sessions as a terminal table (id, query, model, iterations, cost, status, duration).
Stats require pgserve storage. If ~/.rlmx/data does not exist, the command prints "No stats yet. Run a query first." and exits cleanly. See Configuration for storage setup.

Examples

# Most recent 20 runs as a table
rlmx stats

# JSON for scripting / jq pipelines
rlmx stats --output json

# Cost by model over the last 24 hours
rlmx stats --costs --since 24h

# Tool usage over the last week
rlmx stats --tools --since 7d

# Full event timeline for a specific run
rlmx stats --run 0c3e2f1a-...-9f02

Outputs

Default (sessions table) — plain-text columns written to stdout:
ID          Query                           Model                       Iter        Cost  Status     Duration
--------------------------------------------------------------------------------------------------------------
0c3e2f1a..  How does IPC work?              google/gemini-3.1-flash...     3    $0.0042  completed       4.1s
f91d8a05..  Summarize paper.md              google/gemini-3.1-flash...     2    $0.0011  completed       1.8s
--run <id> — one row per event (llm_call, repl_exec, sub_call) with iteration, token counts, cost, duration, and kind-specific detail (model, code preview, request type). --costs — one row per (session, model) pair with total calls, input/output tokens, cost, and average call duration. --tools — one row per (session, request_type) with calls, errors, and average duration. --output json — any of the above as a pretty-printed JSON array of rows.

rlmx benchmark

Run benchmarks that compare the RLM loop against a direct LLM call on the same question. rlmx benchmark does not accept --context — each mode ships its own dataset.
rlmx benchmark <mode> [options]
ModeDatasetMeasures
costBuilt-in curated dataset (src/benchmark-data.json)Tokens, cost, latency savings for RLM vs direct
oolongOolong Synth auto-downloaded via HuggingFaceAnswer quality (accuracy) plus the same cost metrics

Flags

FlagApplies toDefaultDescription
--output jsoncosttablePrint JSON results to stdout instead of the table
--samples <n>oolong5Number of samples to evaluate
--idx <n>oolongRun a specific sample index only (ignores --samples)
--tools <level>bothcoreTool level used for the RLM runs
Model and provider are resolved from rlmx.yaml / ~/.rlmx/settings.json via the usual priority order.

Examples

# Cost benchmark, formatted table on stderr
rlmx benchmark cost

# Cost benchmark, machine-readable JSON on stdout
rlmx benchmark cost --output json

# Oolong quality run — 5 samples (default)
rlmx benchmark oolong

# Oolong — 20 samples
rlmx benchmark oolong --samples 20

# Oolong — just sample index 42
rlmx benchmark oolong --idx 42

Outputs

Both modes print a box-drawn comparison table to stderr with per-question rows (Direct / RLM / Savings) and a TOTALS footer covering tokens, cost, latency, and average RLM iterations:
┌───────────────┬──────────┬─────────┬──────────┬──────────┬──────┐
│ Question      │ Mode     │  Tokens │     Cost │  Latency │ Iter │
├───────────────┼──────────┼─────────┼──────────┼──────────┼──────┤
│ ipc_summary   │ Direct   │  12,400 │ $0.00093 │     820ms│   -  │
│               │ RLM      │   3,100 │ $0.00024 │   2,410ms│    3 │
│               │ Savings  │   75.0% │    74.2% │        - │      │
├───────────────┼──────────┼─────────┼──────────┼──────────┼──────┤
│ TOTALS        │ Direct   │  94,200 │ $0.00707 │   6.2s   │   -  │
│               │ RLM      │  23,500 │ $0.00182 │  18.4s   │  2.8 │
│               │ Savings  │   75.1% │    74.3% │        - │      │
└───────────────┴──────────┴─────────┴──────────┴──────────┴──────┘
With rlmx benchmark cost --output json, the same results are emitted as a structured JSON document to stdout (timestamp, mode, model, per-question runs[], and totals). Every benchmark — table or JSON — is also persisted to ~/.rlmx/benchmarks/benchmark-<mode>-<timestamp>.json and the saved path is printed to stderr.

rlmx config

Manage global settings stored at ~/.rlmx/settings.json.

rlmx config set

rlmx config set <key> <value>
Set a configuration value. Values are type-coerced: "true" becomes boolean, numeric strings become numbers.
rlmx config set GEMINI_API_KEY sk-abc123
rlmx config set model.provider google
rlmx config set model.model gemini-3.1-flash-lite-preview
rlmx config set budget.max_cost 0.50
rlmx config set gemini.thinking_level medium

rlmx config get

rlmx config get <key>
Retrieve a setting value. API keys are masked in output.
$ rlmx config get model.provider
google

rlmx config list

rlmx config list
Show all configured settings. Sensitive keys (containing API_KEY, SECRET, TOKEN) are masked.

rlmx config delete

rlmx config delete <key>
Remove a setting.

rlmx config path

rlmx config path
Print the settings file path (~/.rlmx/settings.json).

Common keys

KeyDescriptionExample
GEMINI_API_KEYGoogle Gemini API keyAIza...
ANTHROPIC_API_KEYAnthropic API keysk-ant-...
OPENAI_API_KEYOpenAI API keysk-...
GROQ_API_KEYGroq API keygsk_...
XAI_API_KEYxAI API keyxai-...
OPENROUTER_API_KEYOpenRouter API keysk-or-...
model.providerLLM providergoogle, anthropic, openai
model.modelModel IDgemini-3.1-flash-lite-preview
model.sub_call_modelModel for llm_query() sub-callsgemini-3.1-flash-lite-preview
budget.max_costDefault max USD per run0.50
budget.max_tokensDefault max tokens per run100000
budget.max_depthDefault max recursion depth3
tools_levelDefault tool levelcore, standard, full
cache.retentionCache TTL strategyshort, long
gemini.thinking_levelDefault thinking levelminimal, low, medium, high
gemini.google_searchEnable web search batterytrue
gemini.code_executionEnable server-side Pythontrue

Priority order

Settings are resolved in this order (highest priority first):
  1. CLI flags (--max-cost 0.10)
  2. Project rlmx.yaml
  3. Global ~/.rlmx/settings.json
  4. Hardcoded defaults

Tool levels

The --tools flag controls which functions are available in the REPL:

core (default)

Paper-faithful RLM functions:
FunctionDescription
contextInjected context variable
llm_query(prompt)Single LLM completion
llm_query_batched(prompts)Concurrent LLM calls
rlm_query(prompt)Recursive child RLM session
rlm_query_batched(prompts)Parallel child RLM sessions
SHOW_VARS()List all REPL variables
FINAL(answer)Terminate with answer string
FINAL_VAR(name)Terminate with variable value
Plus any custom tools defined in rlmx.yaml.

standard (core + batteries)

All core functions plus utility batteries:
FunctionDescription
describe_context()Metadata overview of loaded context
preview_context()Content sample
search_context(query)Semantic search over context
grep_context(pattern)Regex search over context
chunk_context()Split context into chunks
chunk_text(text)Split arbitrary text by size
map_query(fn, items)Distributed LLM calls
reduce_query(fn, items)Aggregation queries
With Google provider, also includes Gemini batteries:
FunctionDescription
web_search(query)Google web search
fetch_url(url)Fetch and summarize URL content
generate_image(prompt)Image generation

full (standard + environment)

All standard functions plus auto-injected information about available Python packages and versions in the REPL environment.

Global flags

These flags work with any command:
FlagDescription
--help, -hShow help message
--version, -vShow version