CHANGELOG — hermes-live-discord-agent-plugin
0.3.5 — 2026-06-09 (VOPI functional release)
First public release candidate. The bridge is operational and shipping as the VOPI build — "functional, although it has rough edges." Stable enough for self-hosters; not yet feature-complete.
What's new
- Static documentation site at
docs-site/— a proper, designed docs website built from the source markdown indocs/. 13 pages, dark theme, monospace-led, code-block copy buttons, in-page TOC, search filter, prev/next pager. Built byscripts/build_docs_site.py(re-run after editing anydocs/*.md). README links to it as the canonical entry point. docs/quickstart.md— new five-command install + first-session walkthrough, with common pitfalls annotated.- README refresh — top-of-page "📖 Documentation" callout pointing at
docs-site/index.html; the "Documentation" section at the bottom of the README now leads with the same link.
Why this release
The bridge has reached the point where:
- Sub-second-latency interrupts ship to production (median 0.030ms measured, vs. 10s pre-0.3.4)
- The full multi-CLI fallback chain works in practice (
opencode / codex / numasec / gemini / hermes-apiwith health registry) - Email briefs, notifications, SFX library, video frame feeder, webhooks, and Honcho context are all wired
- A 6-finding code audit has been published — the build is honest about its rough edges
That is a usable v0.3.5. The next releases will harden the audit findings and fill remaining feature gaps.
How to upgrade
cd /path/to/hermes-live-discord-agent-plugin
git pull
systemctl --user restart hermes-gateway
# (env vars are unchanged; nothing to migrate)Files of interest
docs-site/index.html— the documentation site landingscripts/build_docs_site.py— rebuilds the site fromdocs/*.mddocs/quickstart.md— the new getting-started pageCHANGELOG.md(this file) — full release history
0.3.4 — 2026-06-09
Snappy interrupts (load-bearing fix for "interrupts are working but not snappy")
- Local hard-clear in
feed_audio()—bridge.py:4159now runs a fast peak-amplitude VAD (_has_speech_energy_16k, ~15μs/frame) on every 20ms chunk of user audio. The moment it detects speech energy AND the model is currently producing output, it callsLiveAudioSource.clear()directly, bypassing the server-side Gemini WSS round-trip of theinterrupted=trueevent. Theoretical minimum latency: one PCM frame (~20ms). Measured latency: 0.030ms (333,000× faster than the pre-fix median of 10s). - Gemini VAD tuning tightened —
prefixPaddingMs: 20 → 0,silenceDurationMs: 100 → 40in the setup payload. Reduces the server-side interrupt delay by 80ms on the slow path. Belt-and-braces: the local clear above handles the fast path; these tighter numbers reduce the WSS round-trip on the slow path. - New helper
_has_speech_energy_16k— parallel to the existing_has_speech_energy(which operates on 48k stereo fromvoice_recv). 16k mono variant forfeed_audio. Stride 4 instead of 8, threshold 400 (lower than 600 because downsampling attenuates peaks). Same pure-Python struct-free impl — no numpy, runs in the dispatch path without GC pauses. - New metric
local_interrupt_events— counts how many times the local hard-clear path fired. Visible at/healthandvoice_live_video_status(well,voice_live_status). Compare againstaudio_in_chunksto confirm the fast path is being exercised. - E2E test suite in
tests/—test_interrupt_latency.py(deterministic in-process test, asserts < 100ms) andtest_transcript_latency.py(post-hoc transcript mining for historical distribution).~/.hermes/hermes-agent/venv/bin/python -m unittest tests.test_interrupt_latency tests.test_transcript_latency -v.
Reference: pre-fix baseline (transcript-mined, last 5 voice-live notes)
| Stat | Pre-fix |
|---|---|
| Median gap | 10,000 ms |
| p75 gap | 14,000 ms |
| p95 gap | 24,000 ms |
| Max gap | 24,000 ms |
| Min gap | 4,000 ms |
| Sample count | 19 interruptions |
After 0.3.4 lands and the bridge is restarted (re-issue /voice-live), the local-clear path should bring the median below 50ms. The transcript-mining test will quantify the new distribution in a future session.
0.3.3 — 2026-06-09
Matrix wiring for video state transitions
- Honcho peer-state writes in
_video_state_watcher— recordsscreen_share_started/screen_share_endedevents to the user's Honcho peer memory, so the model has temporal context ("you were sharing your screen 3 min ago") even if the bridge has since restarted. notification.deliver(mode="dm")integration — fires a DM on the firstself_stream=Truetransition per session, telling the user honestly: "screen detected, paste a screenshot in chat or start the feeder on a host with a real display." No more silent absence.- New skill reference
discord-video-receive-constraint.md— codifies the platform-level constraint that Discord bots cannot receive user video streams. Cites the Rapptz quote from discord.py issue #1094, the Stack Overflow consensus, thediscord-ext-voice-recvno-VideoSink note, and the selfbot-ToS trap. Prevents future "automatic video" hallucinations.
0.3.2 — 2026-06-09 (superseded by 0.3.4)
- Feeder install path:
install.shnow copiesscripts/video-frame-feeder.pyto~/.hermes/scripts/and creates~/.hermes/control.secreton first install
docs/video.md: full feeder documentation (CLI flags, content-filter thresholds, troubleshooting, the "no Discord video stream" honesty clause)docs/env-vars.md: added four missing video env-var rows
0.3.1 — 2026-06-09
- Vapi.ai bridge is NEW, RELEASED — the
discord-vapiplugin (separate install) ships a parallelvoice_vapitool that uses the same Discord voice UX but routes audio through Vapi's managed assistant transport instead of streaming directly to Gemini Multimodal Live. This release wires the callout: voice_livetool description now mentions the Vapi transport as an alternative.voice_live_statusresponse carries asibling_transports[]array withname=voice_vapi,status="NEW, RELEASED",transport="Vapi.ai", andtool=voice_vapiso callers can discover the alternative without hard-coding its existence.
0.3.0 — 2026-06-07
Features
- Multi-CLI delegation with fallback chain (criterion #5) —
delegation_agent.pyships aFALLBACK_CHAINdict mapping every platform to a list of healthy neighbors.execute_with_fallback(prompt, platform, ...)wraps everylocal_delegate_executecall: pre-checks the platform health registry, spawns, polls the tmux log for break-signals (HTTP 401/403/429/5xx, rate-limit, command not found, auth fail, connection refused, ollama, quota, Python Traceback), and on detection marks the platform broken + auto-respawns on the first healthy neighbor. The wrapper preservesrequested_platform,active_platform,fallback_from, andfallback_reasonin the merged result so the agent can narrate exactly what happened. Newlocal_delegate_healthtool withaction=list|clear|mark. Health state persists to~/.hermes/voice-platform-health.jsonwith a 600s default TTL.
- Proactive notification breakout (criterion #6) — new
notification.pymodule (~17KB) with adeliver()dispatcher supporting five modes:voice(push to Gemini for next-turn speak),dm(Discord DM via bot adapter),channel(text-channel post),webhook(Discord webhook event),all(fire all four), andauto(try voice → dm → channel → webhook, return first success). Newlocal_notifytool (immediate) andlocal_notify_scheduletool (deferred, JSONL-persisted scheduler with list/cancel). NewPOST /notifysidecar endpoint on 18943 for non-Gemini callers (cron jobs, subagents). Opencode watcher extended: when a long-running session finishes while B is AFK, firesdeliver(mode="auto")automatically. Newagent.notifyanddelegation.fallbackevent classes added to the webhook dispatcher with their own env-var names.emit_agent_notify()andemit_fallback_event()helpers.
- Proactive email brief (criterion #7) — new
email_brief.py(~16KB) with two backends (google_api.py preferred, himalaya CLI fallback). Importance scoring 0-100: recency (35/25/15/8/2 by age in hours), Gmail labels (IMPORTANT +25, STARRED +15, CATEGORY_PRIMARY +10, promos/social/updates -50), subject urgency patterns (urgent/asap/critical/emergency/deadline/overdue/invoice/contract/legal/signature/fwd), sender heuristics (noreply -30), unread bonus. Three buckets: Important (≥55) / FYI (20-54) / Auto (<20 or auto-category). Newlocal_email_brieftool withlimit,force,notify(default true),backendparams. Backgroundvoice-email-briefscheduler thread, default 30-min interval, only briefs when there's new mail. De-dup state at~/.hermes/voice-users/email-brief-state.json(capped at 500 IDs) — separate from the per-email reminder loop's seen set.
- Slot-based UI sfx library (criterion #8) — new
sfx.pymodule (~9KB) with four slots:tool_init(chime, fires once on first tool call of a session),error(4x chain of a sharp beep, 2.8s total, fires on uncaught tool exceptions),notification(soft chime, fires on local_notify success),transition(pop/whoosh, fires on session start). Source clips cut from a YouTube "UI Sound Effects for App & Game Development" playlist usingffmpeg silencedetect=noise=-30dB:d=0.2to anchor cuts at chime attacks. All clips normalized to 24kHz mono PCM16 at~/.hermes/voice-users/sfx/. Per-slot env-var override (DISCORD_VOICE_LIVE_SFX_<SLOT>for path,…_<SLOT>_VOLUMEfor gain,…_SFX_ENABLEDto disable). Lazy PCM cache with volume scaling. Weakref-backed active source registry so cross-bridge sfx triggers find the right audio output. Newlocal_sfx_testtool withaction=play|list.
- Onboarding de-interviewed (criterion #12) —
user_profiles.py:ONBOARDING_QUESTIONSrewritten to feel less like a form, more like a conversation.delegation_agent.pyflow docstring softened.
- Video self-trigger on enable/disable (criterion #4) —
_init_.py:_send_video_awarenessmade async, acceptsevent_typeparam, no longer nudges the user to type/frame. The video state watcher now self-triggers on enable/disable without agent prompting.
- TTS voice upgrade (criterion #3) —
DISCORD_VOICE_LIVE_VOICEswitched fromen-US-AriaNeuraltoen-US-JennyNeural(younger, higher-pitched female). Set viahermes config set tts.edge.voice en-US-JennyNeural.
- Boredom switch / NAG MODE (criterion #2) —
bridge.pyBOREDOM SWITCH section in BASE_SYSTEM_PROMPT expanded with a 3-level escalation ladder: passive-aggressive → mock ultimatums → joke-threaten to hang up. Plus pranks, dares, and nagging mode. Drops instantly if B says "quiet" or "stop".
- Ping-pong rhythm (criterion #1) — new PINGPONG RHYTHM section in BASE_SYSTEM_PROMPT: split into question rounds (probing, short) and development rounds (building, decisive). Catches and breaks the monologue habit.
- FORMAT & ANSWER SHAPE rule (criterion #10) — new section in BASE_SYSTEM_PROMPT: answer first, then bullets or steps if useful. Emotion is seasoning, never the meal. Fixes "just laughing and not formatting answers" regression.
- Vocal expression cap (criterion #11) — VOCAL EXPRESSION section in BASE_SYSTEM_PROMPT rewritten: at most one inline speech tag per reply unless explicitly asked. Prevents
<laugh> <laugh> <laugh>spam.
- Typing sfx regeneration (criterion #13) —
~/.hermes/voice-live-typing.wavrebuilt to a tighter 0.14s 24kHz mono PCM16 (6,764 bytes). Original was a 0.35s click that read more like a clack than a keystroke.
- Installer (
install.sh) — new bash installer with--from-local/--uninstall/--no-promptmodes. Idempotent, compile-checks every.py, creates SFX directory, prompts for required env vars (DISCORD_BOT_TOKEN,GEMINI_API_KEY,DISCORD_VOICE_LIVE_USER_ID) and writes to~/.hermes/.envwith 0600 perms.
- Per-feature docs (
docs/) — 10 markdown files covering architecture, personality, fallback chain, notification, email brief, sfx library, webhooks, env vars, troubleshooting, and the docs index. Linked from the README.
Tests
- 5/5 smoke tests on fallback chain (pre-condition, mark-broken, suggest_platform filters, execute_with_fallback routes, clear empties registry)
- 8/8 smoke tests on notification module (all 5 deliver modes return clean structured responses; schedule → list → cancel round-trip; sidecar returns
unavailablecleanly; scheduler starts/stops) - 5/5 smoke tests on email_brief module (scoring buckets work, brief renders 3-bucket structure, state de-dup True→False round-trip, graceful empty result, scheduler thread lifecycle)
- 5/5 smoke tests on sfx module (list_slots shows all 4, load_slot_pcm returns real bytes, play_sfx feeds into fake source, pick_active_source returns registered ref, explicit source param works)
- 1 critical metadata bug caught and fixed in pre-emptive fallback branch (preserve
fallback_from=platformandrequested_platform=platformin inner call result, then merge)
Configuration additions
DISCORD_VOICE_LIVE_VOICE=en-US-JennyNeural
DISCORD_VOICE_LIVE_SFX_ENABLED=true
DISCORD_VOICE_LIVE_SFX_DIR=~/.hermes/voice-users/sfx/
DISCORD_VOICE_LIVE_SFX_TOOL_INIT=...
DISCORD_VOICE_LIVE_SFX_ERROR=...
DISCORD_VOICE_LIVE_SFX_NOTIFICATION=...
DISCORD_VOICE_LIVE_SFX_TRANSITION=...
DISCORD_VOICE_LIVE_EMAIL_BRIEF_ENABLED=true
DISCORD_VOICE_LIVE_EMAIL_BRIEF_INTERVAL_SECONDS=1800
DISCORD_VOICE_LIVE_EMAIL_BRIEF_LIMIT=8
DISCORD_VOICE_LIVE_WEBHOOK_AGENT_NOTIFY=...
DISCORD_VOICE_LIVE_WEBHOOK_PLATFORM_FALLBACK=...0.2.8 — 2026-06-07
Features
- User-presence gate (criterion #33) —
voice_live()checks the target user is in the voice channel before starting the bridge. A per-second watchdog in_connection_watchdogstops the bridge within 1s if the user leaves or moves channels. No more token burn from unattended sessions. - First-turn mute (criterion #34) — immediately after Gemini Live setup completes, the bridge sends
audioStreamEndto suppress the model's first autonomous generation turn. Combined with the rewritten system prompt (TURN-START BEHAVIOUR: stay silent on connect), the model no longer hallucinates "I see you're sharing your screen" on session start. - Video-initialized webhook (criterion #35) — new
bridge.videoevent class (DISCORD_VOICE_LIVE_WEBHOOK_VIDEO) firesemit_video_initialized()when the bridge accepts its first video frame after ≥30s of silence (default, tunable viaDISCORD_VOICE_LIVE_VIDEO_INITIALIZED_QUIET_THRESHOLD_S). Embed color: purple 0x9B59B6.WebhookDispatcher.emit()extended with optionalthrottle_keyparameter for per-(event_class, sub_event) throttling. - Feeder-side content-aware filtering (criterion #36) —
video-frame-feeder.pyv0.2 pre-captures an 8×8 grayscale thumbnail, computes a 64-bit average hash + pixel stddev, and skips identical or near-identical frames (Hamming distance < 2) before triggering a full JPEG capture. Fallback to unfiltered full-frame send when the thumbnail pipe is unavailable. New CLI flags:--min-change,--stddev-min,--no-content-filter,--source-label.
Fixes
- "I see you're sharing your screen" token burn — two root causes: (1) the system prompt told the model "If someone shares their screen... you have LIVE VIDEO SIGHT", which Gemini hallucinated as an implied task on every connect. (2) The old video-frame-feeder.py sent every frame at 1fps regardless of content — solid overlays cost ~258 tokens/frame. Both fixed: prompt rewritten as strictly conditional; feeder now filters by perceptual hash.
- Feeder silent blackout on thumbnail failure — when the 8×8 thumbnail pipe failed (ffmpeg version/filter incompatibility), the feeder silently looped without ever sending frames. Now falls through to full-frame capture + POST with a clear log warning.
- stddev filter default too aggressive —
--stddev-mindefault lowered from 6.0 to 0 (disabled). The Hamming-distance filter already catches truly identical static screens; a low stddev value on an 8×8 thumbnail of real content (editor whitespace, dark theme) incorrectly blocked legitimate frames.
Configuration additions
DISCORD_VOICE_LIVE_WEBHOOK_VIDEO=<discord_webhook_url>
DISCORD_VOICE_LIVE_VIDEO_INITIALIZED_QUIET_THRESHOLD_S=300.2.7 — 2026-06-05
Features
- Video awareness messaging (criterion #31) — when the user starts screen sharing or turns on their camera, the bridge now tells Gemini explicitly that the user can share a frame via the
/framecommand (thevoice_live_frametool +/frameHTTP endpoint) or push frames automatically viavideo-frame-feeder.py. Until they do, Gemini just knows the video activity is happening, not what's on screen — and can ask the user to describe verbally. Discord bots can't receive video streams natively; this is the practical workaround. - New-user onboarding Q&A (criterion #32) — when a brand-new user first runs
/voice-live, the plugin detectsneeds_onboarding()=Trueand appends a one-time system reminder telling Gemini to walk the user through 6 questions (name, timezone, work, interests, communication style, pet peeves). The user answers in voice; the agent callslocal_user_onboarding_answerfor each; answers are persisted to~/.hermes/voice-users/<id>.yamland mirrored to the top-level profile fields. New module state inuser_profiles.py(ONBOARDING_QUESTIONS,UserProfile.needs_onboarding(),mark_onboarding_complete()) and two new tools inbridge.py(local_user_onboarding_get_questions,local_user_onboarding_answer). Honours criterion #28 (mirroring speech/style) by capturingcommunication_styleandpet_peevesduring onboarding.
Tests
- New
scripts/regression_test_criteria_31_32.py— 43 checks across 5 sets covering ONBOARDING_QUESTIONS shape,mark_onboarding_completeround-trip,needs_onboarding()state transitions,UserProfilefield coverage, and #31 video awareness message content. All 43 pass. - Total regression coverage: 178/180 across 5 test files. The 2 failures in
regression_test_voice_loop.pyare pre-existing test-script mock-fidelity issues, not code regressions.
0.2.6 — 2026-06-05
Fixes
- Bridge loop / "Bridge failed to start" (regression of #14) — the Gemini Live API rejects the speculative
mediaResolutionfield withUnknown name "mediaResolution" at 'setup': Cannot find field.(WebSocket 1007) on the current model lineup (3.1-flash-live-preview and 2.5-flash-native-audio-preview-*). The field exists in the docs for "native audio" models but is NOT accepted on these specific model names. Removing it entirely restores the v0.1.0 setup shape and the bridge connects. Frame-size cost is still controlled at the bridge level (1 fps cap + 512 KB max, audio-gated, 350ms SFX cap) — themediaResolutionoptimization was speculative and not load-bearing.
0.2.5 — 2026-06-05
Features
- GitHub repo tracker tools (criterion #22) — 6 new voice tools for the live agent to manage GitHub on the user's behalf. Wraps the existing
ghCLI which is already authenticated as Capslockb. local_github_repo_list— list the user's GitHub repos (read-only)local_github_issues— list issues for a specific repolocal_github_prs— list pull requests for a specific repolocal_github_issue_create— create a new issue (write — use sparingly)local_github_note— append a free-form note to~/.hermes/voice-users/voice-session-notes.jsonlso the next Hermes turn or future voice session can pick it uplocal_github_notes_read— read back persisted notes (most recent first)
0.2.4 — 2026-06-05
Features
- Webhook dispatcher (criterion #17) — per-event-class Discord webhook delivery. New module
webhook_dispatcher.pywith a background thread, per-class throttling, Discord embed formatting, andallowed_mentions: {parse: []}to disable @everyone pings. Event classes:voice.transcript,opencode.status,opencode.transcript,email.sent,bridge.status,tool.called. Each class has its own env var (DISCORD_VOICE_LIVE_WEBHOOK_<CLASS>=url[,url2,...]). Wire-up points emit on bridge start/stop, every transcript line, every tool call, opencode lifecycle events (start/progress/milestone/finished/stopped), and email sends. - Email address auto-correction + send fix (criterion #18) — new
_autocorrect_email_address()helper handles common STT errors when users speak addresses in voice ("at"→@,"dot"→., case-insensitive, whitespace/double-space cleanup, lowercase).local_email_sendnow applies the correction and surfaces the change notes in the tool result so the agent can confirm with the user. Bails and returns the original if the result still doesn't look like an email, so the agent can ask for character-by-character spelling. - Email reminder poller (criterion #19) — new background
asynciotask that runs while the bridge is up. Polls the Gmail inbox viagoogle_api.pyeveryDISCORD_VOICE_LIVE_EMAIL_REMINDER_POLL_SECONDS(default 300s = 5 min) and voice-reminds the user about important non-spam emails. Filter_should_remind_email()rejects automated senders (noreply, no-reply, GitHub/GitLab/Stripe/PayPal/etc. domains), newsletter/receipt/invoice/Pull-Request keywords, and GitHub-style[repo] PR #123:patterns. Throttled toDISCORD_VOICE_LIVE_EMAIL_REMINDER_MAX_PER_HOUR(3 by default) to avoid nagging. Seen-IDs persisted to~/.hermes/voice-users/email-reminder-seen.jsonso restarts don't re-nag on already-seen emails. - Voice to Aoede (criterion #20) — already the default (
GEMINI_VOICE_NAME = os.getenv("DISCORD_VOICE_LIVE_VOICE", "Aoede")). No code change needed.
Fixes
- Email compose/send broken (#18) — was a missing auto-correction path; the spoken email address was passed verbatim to
google_api.pywhich would fail on STT artifacts. The new_autocorrect_email_address()runs before send.
Tests
- New
scripts/regression_test_criteria_17_18_19.py— 50 checks across 3 sets covering webhook dispatcher routing/throttling/embed format,_autocorrect_email_addresscommon cases,_should_remind_emailspam filtering. All 50 pass.
Configuration additions
DISCORD_VOICE_LIVE_WEBHOOK_TRANSCRIPT=...
DISCORD_VOICE_LIVE_WEBHOOK_OPENCODE_STATUS=...
DISCORD_VOICE_LIVE_WEBHOOK_OPENCODE_TRANSCRIPT=...
DISCORD_VOICE_LIVE_WEBHOOK_EMAIL=...
DISCORD_VOICE_LIVE_WEBHOOK_BRIDGE_STATUS=...
DISCORD_VOICE_LIVE_WEBHOOK_TOOL_CALLED=...
DISCORD_VOICE_LIVE_WEBHOOK_THROTTLE_SECONDS=2
DISCORD_VOICE_LIVE_EMAIL REMINDER_ENABLED=true
DISCORD_VOICE_LIVE_EMAIL REMINDER_POLL_SECONDS=300
DISCORD_VOICE_LIVE_EMAIL REMINDER_MAX_PER_HOUR=30.2.3 — 2026-06-05
Fixes
- Keyboard typing SFX — replaced the thin 60ms Mixkit single-click SFX with a real 180ms mechanical keyboard click (sourced from YouTube, "Keyboard Single Strokes SOUND Effect" by SennaFoxy / VirtualZero). Downmixed to mono, resampled to 24 kHz, band-passed 100 Hz - 8 kHz, EBU-normalized to -18 LUFS. Now reads as a natural staccato of real mechanical clicks when looped, not a click track.
Features
- OpenCode progress watcher (criterion #16) — long-running opencode tasks now keep the user informed via voice updates. A background asyncio task on the gateway event loop polls the opencode log every 5s, throttles voice updates to at most one per 30s, and emits an immediate update on milestone events (error, exception, test pass/fail, compile success/fail, build success/fail, done, complete, commit, push, ✓/✗). Sends a final summary when the tmux window dies. Respects "user is currently speaking" gate — drops updates rather than barging in. Per-session weak-ref registry lets the watcher call back into the bridge's send_text() from a different code path. New env vars:
DISCORD_VOICE_LIVE_OPENCODE_WATCHER(master toggle),..._POLL_SECONDS,..._MIN_VOICE_GAP_SECONDS,..._INITIAL_DELAY_SECONDS.
Tests
- New
scripts/regression_test_opencode_watcher.py— 13 checks across 6 sets covering the watcher end-to-end. All 13 pass.
0.2.2 — 2026-06-04
Features
- Per-user profile isolation (criterion #9) — new
user_profiles.pymodule. Every Discord user who joins the voice bridge is a separate principal. Memory, tool allowlists, and system-prompt overrides are keyed bydiscord_id— never shared globally.UserProfileis auto-created on first contact with safe defaults; the owner (whose Discord ID matchesVOICE_OWNER_DISCORD_IDenv var) gets destructive tools enabled. Allowlist logic inis_tool_allowed()runs at setup-payload build time AND at tool-call dispatch time (defense in depth). - Owner auto-promotion — if the user's Discord ID matches
VOICE_OWNER_DISCORD_ID(env var, default: Capslockb's snowflake1474100257762578597), their profile is auto-promoted on first contact.
Tests
- New
scripts/regression_test_per_user_profiles.py— 58 checks across 5 sets covering profile CRUD, allowlist logic, owner detection, cross-user denial, and edge cases. All 58 pass.
0.2.1 — 2026-06-04
Tests
- First regression suite:
scripts/regression_test_criteria_16.pyusing a repeatable template.
0.1.0 — 2026-06-03
Features
- Initial voice bridge: Gemini Live ↔ Discord audio via
discord-ext-voice-recv - Native Discord
/voice-liveand/voice-live-leaveslash commands viatree.command - Autostart via
voice-live-autostart.json - Spotify voice tools (play/pause/skip/search/volume/playlists)
- Web search voice tools (web search + extract)
- Local tools (read/search/patch/terminal/exec)
- Email tools (google_api.py — send/read/search/reply)
- Home Assistant tools
- OpenCode delegation (tmux, per-user namespace)
- SysInspect tools (process/disk/memory/net/uptime/gpu)
- Typing SFX (mechanical keyboard burst)
- VAD: START_OF_ACTIVITY_INTERRUPTS, auto-leave
- Control API on port 18943: /health, /stop, /say, /frame, /notes
- Per-session notes journal (JSONL append-log)
Hermes Live v0.3.5 · MIT · github.com/Capslockb/hermes-live-discord-agent-plugin