Overview / docs-site/
v2 Gemini/SORA
H
Discord voice agent · Gemini Live · SORA bridge helpers

Hermes Live

A self-hosted Discord voice bridge powered by Gemini Live, now documented with a truthful SORA release map.

Gemini transport SORA diagnostics local sidecar manual frame feed truth table
status
v2 cross-exam pass. The public docs now separate working Gemini bridge features from partial/backend-dependent systems and research targets. Vapi, MCP, and Dograh are not described as bundled features unless code and tests land in this repository.

Architecture

Discord Voice → Opus Decode → 48 kHz PCM → 16 kHz mono → Gemini Live WSS
Gemini Live WSS → 24 kHz PCM → 48 kHz stereo → Discord AudioSource

Manual frame / screenshot → 127.0.0.1:18943 /frame → Gemini Live
SORA tools → preflight · grill · goal synth · redact → Hermes tools

Truth map

AreaStatusScope
Gemini Live Discord voiceWorkingVoice join/leave, audio RX/TX, Gemini Live WSS.
Sidecar APIWorkingLocal 127.0.0.1:18943 health, frame, say, notes, notify, stop/leave.
Manual visionWorking with constraintFrames can be pushed; Discord screenshare/camera is not automatically visible to bots.
SORA bridge elementsIncludedPreflight, Live Grill Mode, goal/subgoal synthesis, and redaction.
VapiSibling / not bundledDo not document as shipped inside this repo.
MCPResearch targetNo first-class MCP server/client in this repo yet.
DograhResearch targetExternal comparison/integration target, not bundled.

Get started

git clone https://github.com/Capslockb/hermes-live-discord-agent-plugin.git
cd hermes-live-discord-agent-plugin/installer
./install.py
cd ..
python3 installer/enable_sora_bridge_elements.py
python3 -m py_compile plugin/sora_bridge_elements.py plugin/__init__.py
systemctl --user restart hermes-gateway

Then join a Discord voice channel and run /voice-live. Stop with /voice-live-leave.

Documentation

Quick start →

Install, restart, first Discord voice session, local health check.

Architecture →

Audio path, sidecar flow, lifecycle, and integration boundaries.

SORA bridge elements →

Preflight, transcript grilling, goal synthesis, and redaction.

Release truth table →

Working, partial, sibling, and research claims separated cleanly.

Video feeder →

Manual frame feeder and the Discord screenshare limitation.

Environment variables →

Every key, default, and optional backend configuration.

Troubleshooting →

Bridge failures, sidecar checks, logs, and common runtime errors.

Changelog →

Release history and load-bearing fixes.

Release checks

voice_live_status
voice_live_notes limit=10
sora_bridge_preflight
sora_redact text="Authorization: Bearer fake.fake.fake"
sora_live_grill text="migrate SORA bridge features into Gemini bridge"
sora_goal_synth text="migrate SORA bridge features into Gemini bridge"

License

MIT. Free to fork, host, and extend.