Skip to main content

Scheduler

The scheduler daemon is the background process that claims and fires triggers from pgserve. It tracks worker liveness via the state management layer. It runs as a persistent loop, combining real-time PostgreSQL notifications with poll-based fallback for reliability.

Architecture

┌──────────────────────────────────────────────────┐
│                Scheduler Daemon                  │
│                                                  │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────┐ │
│  │ LISTEN loop │  │  Poll loop  │  │Heartbeat │ │
│  │ (realtime)  │  │  (30s safe- │  │collector │ │
│  │             │  │   ty net)   │  │  (60s)   │ │
│  └──────┬──────┘  └──────┬──────┘  └────┬─────┘ │
│         │                │               │       │
│         └────────┬───────┘               │       │
│                  ▼                       │       │
│         ┌──────────────┐                 │       │
│         │ Claim trigger│                 │       │
│         │ SELECT FOR   │                 │       │
│         │ UPDATE SKIP  │                 │       │
│         │ LOCKED       │                 │       │
│         └──────┬───────┘                 │       │
│                ▼                         │       │
│         ┌──────────────┐                 │       │
│         │ Fire trigger │                 │       │
│         │ (spawn agent)│                 │       │
│         └──────────────┘                 │       │
│                                          │       │
│  ┌──────────────────┐  ┌──────────────┐  │       │
│  │ Orphan reconcile │  │   Machine    │  │       │
│  │    (every 5m)    │  │  snapshots   │←─┘       │
│  └──────────────────┘  └──────────────┘          │
└──────────────────────────────────────────────────┘

Configuration

interface SchedulerConfig {
  maxConcurrent: number;       // Max concurrent runs (default: 5)
  pollIntervalMs: number;      // Poll interval (default: 30,000ms)
  maxJitterMs: number;         // Batch catch-up jitter (default: 30,000ms)
  jitterThreshold: number;     // Triggers before jitter kicks in
  heartbeatIntervalMs: number; // Heartbeat collection (default: 60,000ms)
  orphanCheckIntervalMs: number; // Orphan reconciliation (default: 300,000ms)
  deadHeartbeatThreshold: number; // Missed heartbeats before marking dead (default: 2)
}
Override the concurrency cap with GENIE_MAX_CONCURRENT.

Trigger Lifecycle

Claiming

Triggers are claimed using PostgreSQL’s SELECT FOR UPDATE SKIP LOCKED, which provides lease-based atomic claiming across multiple scheduler instances:
SELECT * FROM triggers
WHERE status = 'pending' AND due_at <= now()
ORDER BY due_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1;
This ensures exactly-once execution even if multiple scheduler daemons are running.

Idempotency

Each trigger can carry an idempotency_key. A unique index on this column prevents double-fire:
CREATE UNIQUE INDEX idx_triggers_idempotency
  ON triggers(idempotency_key)
  WHERE idempotency_key IS NOT NULL;

State Flow

pending → executing → completed

               failed

              skipped

Cron Expressions

The cron parser supports standard 5-field expressions with extensions:
┌───────────── minute (0-59)
│ ┌─────────── hour (0-23)
│ │ ┌───────── day of month (1-31)
│ │ │ ┌─────── month (1-12)
│ │ │ │ ┌───── day of week (0-6, Sunday=0)
│ │ │ │ │
* * * * *
Supported syntax:
  • Wildcards: *
  • Ranges: 1-5
  • Steps: */10, 1-5/2
  • Lists: 1,3,5
Duration strings are also supported for interval-based scheduling:
FormatExampleMilliseconds
Seconds30s30,000
Minutes10m600,000
Hours2h, 1.5h7,200,000 / 5,400,000
Days1d86,400,000

Heartbeat Collection

Every 60 seconds, the scheduler collects heartbeats from all active workers:
  1. Pane liveness — checks if tmux panes are still alive
  2. Agent state — reads current state from the worker registry
  3. Context capture — stores pane content snapshot
Heartbeats are stored in the heartbeats table:
CREATE TABLE heartbeats (
  id TEXT PRIMARY KEY,
  worker_id TEXT NOT NULL,
  run_id TEXT REFERENCES runs(id),
  status TEXT CHECK (status IN ('alive', 'idle', 'busy', 'dead')),
  context JSONB DEFAULT '{}',
  last_seen_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Machine Snapshots

Every 60 seconds (alongside heartbeats), the scheduler captures a machine-level snapshot:
CREATE TABLE machine_snapshots (
  id TEXT PRIMARY KEY,
  active_workers INTEGER NOT NULL DEFAULT 0,
  active_teams INTEGER NOT NULL DEFAULT 0,
  tmux_sessions INTEGER NOT NULL DEFAULT 0,
  cpu_percent REAL,
  memory_mb REAL,
  context JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Orphan Reconciliation

Every 5 minutes, the scheduler scans for orphaned runs — agents that have stopped responding:
  1. Find runs in leased or running status
  2. Check if the worker has missed more than 2 consecutive heartbeats
  3. Mark dead runs as failed with a reconciliation reason
  4. Reclaim expired leases for retry

Reboot Recovery

On startup, the scheduler performs recovery:
  1. Reclaim expired leases — triggers where leased_until < now() are reset to pending
  2. Reconcile orphaned runs — runs without matching live workers are marked failed
  3. Resume polling — normal LISTEN + poll loop begins

Structured Logging

The scheduler writes structured JSON logs to ~/.genie/logs/scheduler.log:
{
  "timestamp": "2026-03-24T10:30:00.000Z",
  "level": "info",
  "event": "trigger_claimed",
  "trigger_id": "trg-abc123",
  "schedule_id": "sched-daily",
  "trace_id": "trace-xyz789"
}
Trace IDs are propagated from the trigger into the spawned agent’s environment, enabling end-to-end observability from schedule definition to agent execution.