Documentation Index
Fetch the complete documentation index at: https://docs.automagik.dev/llms.txt
Use this file to discover all available pages before exploring further.
Voice Gateway
Omni ships a Bun-native voice gateway that joins Discord voice channels, decrypts incoming audio, and exposes the streams over a local WebSocket so agents can listen, transcribe, and respond. The implementation lives in@omni/voice-client and is wired into the Discord channel adapter — no external Node sidecar required.
What it does
| Capability | Notes |
|---|---|
| Discord voice gateway v8 | UDP transport, libsodium-backed packet auth |
| DAVE E2EE | Discord Audio & Video End-to-End encryption — Omni participates as a full DAVE client (group key exchange, sender keys, key rotation) |
| Opus codec | Decode incoming Opus frames, optionally re-encode for outbound |
| Per-user streams | Audio is demuxed per Discord user ID so transcription/STT can target a single speaker |
| Session WebSocket | A local WebSocket (omni voice stream) emits audio frames + control events for downstream consumers |
Lifecycle
CLI
omni voice stream exits cleanly on Ctrl+C and is safe to pipe into transcription tooling.
Discord-specific notes
- The Discord adapter manages voice per instance — one Discord bot token, one voice gateway. Multiple guild channels can be joined sequentially but only one at a time per instance today.
- DAVE is mandatory for any guild that has it enabled. The voice client negotiates the protocol version automatically and rejects sessions that fail the handshake rather than falling back to plaintext.
- Audio frames are never persisted by the gateway itself. Use
--save <dir>onomni voice streamif you need on-disk artefacts; otherwise the bus is in-memory only.
DAVE rejection is loud — when a session fails handshake the gateway emits a
voice.handshake_failed event. Check omni events list --type voice.handshake_failed --since 1h if voice goes dark in a DAVE-enabled guild.See also
Voice CLI verbs
omni voice join, leave, stream and the speak/listen verbs.Instances
Discord bot token configuration and per-instance setup.
Media architecture
Media storage, transcription, and batch backfills.
Events
Stream
voice.* events for live monitoring.