Five follow-ups to #57659 from post-merge review:
1. install.ps1: gateway scheduled-task re-enable now runs in a finally
(a thrown Remove-Item/uv venv failure previously stranded the user's
gateway autostart disabled), and tasks that were already disabled
before the install are no longer blindly re-enabled.
2. The venv-python holder guard is no longer bypassed by plain --force
(which the desktop bootstrap passes on every update while its lock
probe only checks hermes.exe/app.asar). New explicit --force-venv is
the escape hatch; --force keeps bypassing only the hermes.exe shim
guard.
3. _detect_venv_python_processes now also catches uv/base-interpreter
trampolines whose exe is outside the venv, via cmdline (venv path or
'-m hermes_cli.main' tied to this install root) and cwd.
4. Missing venv python is now UNHEALTHY on managed installs
(.hermes-bootstrap-complete / .update-incomplete markers) so the
repair lane runs instead of 'Already up to date!'; the repair branch
recreates the venv first when it's gone entirely. Dev checkouts keep
reporting healthy.
5. install.ps1 comment no longer claims a Startup-folder disarm the
code doesn't perform (logon-only, not a mid-install respawner).
A single-model Hermes agent never sends temperature; the provider default
applies. MoA hardcoded reference_temperature=0.6 / aggregator_temperature=0.4,
and the coercion float(preset.get(key, 0.6) or 0.6) made unset IMPOSSIBLE to
express: absent, null, empty, and even an explicit 0 all collapsed to the
baked-in default. Every MoA advisor and aggregator therefore ran at 0.6/0.4
while the same model running solo used the provider default — silently
skewing solo-vs-MoA comparisons and overriding provider-tuned defaults.
- moa_config normalization: temperatures coerce to None when absent/blank/
invalid (new _coerce_float_or_none); explicit values incl. 0 honored.
- moa_loop: _preset_temperature() resolves preset values; None flows to
call_llm, which already omits the parameter when None (same contract as
max_tokens). Aggregator still inherits the acting agent's own configured
temperature when the preset doesn't pin one.
- conversation_loop (context-mode MoA): same resolution, no more hardcoded
0.6/0.4 at the call site.
- DEFAULT_CONFIG preset + web_server payload models + docs updated: unset
is the default, pinning stays available.
Group DMs (MPIMs) were classified as DMs and thereby exempted from every
operator control that shared surfaces are supposed to honor: allowed_channels,
require_mention, strict_mention, free_response_channels, and the reaction
guard. Symptom: the bot added 👀/✅ to unmentioned MPIM
messages and still invoked the agent (which then returned NO_REPLY) instead of
the gateway dropping the event before model execution. Removing an MPIM from
allowed_channels did not disable it.
Root cause is the DM classification at adapter.py:
is_dm = channel_type in {"im", "mpim"}
used for BOTH routing exemptions and reaction gating. An MPIM is a shared
surface (multiple humans can see and trigger the bot), not a private 1:1 DM,
so it must be gated like a channel.
This behavior was introduced/reinforced by a trail of Slack group-DM PRs:
- #4633 fix(slack): treat group DMs (mpim) like DMs + reaction guard
- #54632 fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
- #54663 fix(slack): group DMs work OOTB + reinstall nudge
#54632/#54663 correctly made MPIM messages *reachable*; #4633 over-reached by
giving them the DM mention/reaction *exemptions*. This corrects only that
over-reach.
Fix (minimal): introduce `is_one_to_one_dm = channel_type == "im"` and key the
two EXEMPTION sites off it instead of `is_dm`:
- mention/allowlist gating block (`if not is_one_to_one_dm and bot_uid:`)
- reaction guard (`(is_one_to_one_dm or is_mentioned)`)
`is_dm` is intentionally retained for session/thread scoping and chat_type
labeling, where treating an MPIM as a persistent multi-party conversation is
correct — only the mention/reaction exemptions were wrong.
Docs: slack.md now distinguishes 1:1 DMs (mention-exempt) from group DMs
(shared surface; obey require_mention/strict_mention/allowed_channels/
free_response_channels; reactions only when @mentioned).
Tests: +7 in test_slack_mention.py (MPIM unmentioned dropped under
require_mention and strict_mention; MPIM mentioned processed; MPIM off
allowed_channels dropped; MPIM in free_response opted in; 1:1 IM still exempt;
reaction guard drops unmentioned MPIM). Updated _would_process to model the
is_one_to_one_dm gating + strict_mention. 72 passed.
Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.
- Poll the token endpoint with server-provided interval, honoring
slow_down and expires_in; store tokens with auth_mode
oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
rotated tokens sync back to auth.json, the global root store, and the
credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
drop the duplicate dashboard pool entry so exactly one seeded
device_code entry exists.
- Use the shared device_code source name for consistency with the
nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
--manual-paste flag from documented commands.
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.
Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).
E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.
- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
- claude-fable-5 placed above claude-opus-4.8 in both curated lists
- claude-sonnet-5 replaces claude-sonnet-4.6
- sakana/fugu-ultra added near the bottom (before routers/free tier)
- regenerated website/static/api/model-catalog.json via scripts/build_model_catalog.py (live-pulled by CLI, published on merge — no release needed)
The model could pass `toolsets` (top-level and per-task) to delegate_task,
letting it choose which toolsets a subagent got. Toolset selection is a
capability-scoping decision the model should not control; subagents inherit
the parent's enabled toolsets, period.
- Remove `toolsets` from the delegate_task() signature, the registry handler,
the top-level + per-task JSON schema, and the live dispatch path
(run_agent._dispatch_delegate_task — this forwarded it on every model call).
- Single-task and per-task child builds now pass toolsets=None so
_build_child_agent resolves to pure parent inheritance.
- Drop the now-dead _SUBAGENT_TOOLSETS / _TOOLSET_LIST_STR schema-hint block.
- _build_child_agent keeps its internal toolsets param + intersection helpers
(internal API; fed the inherited value only).
- Tests: schema assertions flipped to assertNotIn; added a regression test
proving the dispatch path never forwards a smuggled model `toolsets`.
- Docs: update delegate_task signature refs in the autonomous-ai-agents skill.
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).
- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
(5-min margin), ADC->service-account fallback, global vs regional
endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
credentials-file path is never mistaken for a static API key; mints a
fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
re-mints the token and rebuilds the client on a mid-session 401, so a
long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
runtime resolution.
Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
Wrap each Telegram initialize() attempt in asyncio.wait_for(HERMES_TELEGRAM_INIT_TIMEOUT,
default 30s). When api.telegram.org and all fallback IPs are unreachable, the connect
chain has no outer bound, so a single initialize() blocks for minutes and the
retry-on-exception loop never fires — the gateway appears to hang after the banner.
The timeout guarantees each attempt is bounded, then retries with backoff, then fails
with an actionable error. Also adds WARNING-level progress logs before DoH discovery
and each connect attempt (visible at default log level).
Salvaged onto plugins/platforms/telegram/adapter.py (Telegram moved from
gateway/platforms/ since the PR was opened). Adds env var to docs + AUTHOR_MAP.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
Live DM testing showed a reply to a DM cron brief did NOT continue the job.
Root cause: for a 1:1 DM the governing knob is dm_top_level_threads_as_sessions
(default True), NOT reply_in_thread / cron_continuable_surface. Under the
default, each top-level DM keys to a per-message session (…:dm:<chat>:<ts>),
so a reply mints a new ts and can never converge with the flat …:dm:<chat>
session the cron seed creates.
A 1:1 DM has no thread-vs-timeline split, so "in_channel" has no coherent
meaning for a DM — cron_continuable_surface is a channel concept and is a
no-op for DMs. DM continuation is governed entirely by
dm_top_level_threads_as_sessions:
- false → all top-level DMs share …:dm:<chat> → seed + reply converge → works
- true (default) → per-message sessions → no continuation (cron or interactive)
Option A (chosen): document the requirement; no code change (the flat-DM seed
from the prior commit already lands correctly when the knob is false). Adds a
":::note 1:1 DMs" admonition to cron.md + the zh-Hans mirror.
Verification (real inbound handler, not a hard-coded assumption — the mistake
that made the earlier DM E2E falsely pass): tests/manual/cron_inchannel_dm_e2e.py
drives the REAL _handle_slack_message for a top-level DM under both knob values
and asserts false→converges (…:dm:D_TESTDM == seed), true→diverges
(…:dm:D_TESTDM:<ts>). See decisions.md D9.
Add a per-platform `cron_continuable_surface` extra key
(`thread` default | `in_channel`) so a continuable cron job can deliver
FLAT into a Slack channel — no dedicated thread — and still be
replied-to. In `in_channel` mode the scheduler skips the thread-open
branch (leaves `thread_id=None`); the shipped origin-mirror then seeds
the `(slack, chat_id, None)` shared-channel session — the same bucket
`reply_in_thread: false` routes inbound channel replies to — so a plain
channel reply continues the job in context.
Design: specs/cron-inchannel-continuable (D1–D7, F5). Model B
(shared-channel session), NOT anchoring to the delivery `ts` — on Slack
replying to a specific message IS threading, so a `ts` anchor would only
relocate the thread, never deliver true threadless continuable.
- gateway/platforms/base.py: `supports_inchannel_continuable` capability
flag (default False → unsupported platforms fail SAFE to `thread`).
- plugins/platforms/slack/adapter.py: flag=True; `_cron_continuable_surface()`
resolver (coerces to the two-value enum); `_warn_if_inchannel_without_flat_reply`
connect-time warning (D5: warn, not hard-require — the misconfig fails safe).
- gateway/config.py: shared-key bridge line (top-level OR nested config).
- cron/scheduler.py: read the key generically from platform config, gate
the `in_channel` branch on the adapter capability flag, skip thread-open.
No new seed function (reuses the existing mirror — G6).
Pairing (docs): `in_channel` + `reply_in_thread: false` +
`require_mention: false` (or a free-response channel). Missing
`reply_in_thread: false` fails safe to a threaded continuation.
Gateway-side config flag — `/restart` to apply; NO Slack app reinstall.
Tests (from inside the worktree, PYTHONPATH=$PWD):
- +6 cron scheduler tests (in_channel skips thread-open; seeds flat
channel session with thread_id=None; thread-mode regression;
fail-safe on unsupported platform; value coercion). Prove-fail:
removing the `and not in_channel_surface` guard turns the two
load-bearing tests RED; restore → GREEN.
- +10 slack resolver/capability/warning tests; +2 config-bridge tests.
- tests/manual/cron_inchannel_e2e.py: offline E2E driving BOTH real
legs (delivery seed + inbound reply keying) → both converge on
(slack, C, None).
- No regressions: test_slack.py 216 passed alone; broader sweep green
(4 pre-existing cross-file-ordering failures reproduce identically on
pristine origin/main).
Docs: cron.md + slack.md + zh-Hans mirrors of both.
The messaging gateway table still listed /new ("Start a new
conversation") and /reset ("Reset conversation history") as two
separate commands with divergent descriptions. /reset is an alias
of /new (see COMMAND_REGISTRY in hermes_cli/commands.py) — same
handler, fresh session ID + history. Collapse them into one row
matching the registry wording and the CLI table already on line 39.
Closes#42829.
Remove scripts/setup_open_webui.sh and its 'one-command local bootstrap'
doc sections (EN + zh-Hans). The script pip-installed the third-party Open
WebUI frontend into ~/.local and managed a launchd/systemd user service —
a maintenance liability for downstream software we don't own, and the source
of the LAN first-admin signup footgun in #36121.
The Open WebUI *integration* via the OpenAI-compatible API server is
unaffected: the Docker/Docker-Compose setup, multi-user profile guide, and
troubleshooting in open-webui.md stay, and Open WebUI remains a listed
supported frontend. Only the install-and-service bootstrapper is gone.
@janrenz's PR #35862 added prompt_caching.enabled=false at init only. But
_anthropic_prompt_cache_policy re-derives _use_prompt_caching on every /model
switch (agent_runtime_helpers) and fallback-model swap (chat_completion_helpers),
which re-enabled markers and re-broke the strict proxy the toggle was meant to fix.
Move the kill switch into anthropic_prompt_cache_policy so it returns (False, False)
on every path. Drop the now-redundant init-time override (kept @janrenz's isinstance
hardening on the cache_ttl read). Add policy-level tests + docs for the toggle.
Follow-up to salvaged PR #35862.
Replace the interim monospace table fallback with Slack's native `table`
block (rows of rich_text cells). Addresses the core ask in #18918.
- _table_block(): builds type:"table" with rich_text cells, so inline
formatting (bold, links, code) renders inside cells.
- Column alignment parsed from the markdown separator row (:---, :-:, --:)
into column_settings (left = default/null-skip, center/right emitted).
- Escaped pipes (\\|) are not treated as column separators.
- Respects Slack's table limits (100 rows / 20 cols / 10k aggregate chars);
oversized or unparseable tables gracefully fall back to aligned monospace
(rich_text_preformatted), so a big table never breaks the message.
Docs (EN + zh-Hans) updated to describe native tables + the fallback.
Tests: native table shape, alignment->column_settings, inline-formatted
cells, oversized/too-wide monospace fallback, escaped-pipe cell. Prove-
failed against a stubbed _table_block (native-table tests fail, fallback
tests stay green). All existing Slack tests still pass.
Add platforms.slack.extra.rich_blocks (default off). When enabled, the
final agent message is sent as Slack Block Kit blocks — section headers,
dividers, and true nested lists via rich_text — instead of flat mrkdwn.
- New plugins/platforms/slack/block_kit.py: pure markdown->blocks renderer
(headers, dividers, nested ordered/bullet lists, blockquotes, fenced code;
pipe-tables as aligned monospace since Block Kit has no robust table block).
Enforces Slack's 50-block / 3000-char section limits and returns None to
fall back to plain text on empty/oversized/unexpected input. Never raises.
- adapter.send(): render blocks on the single-chunk primary message; a
text= fallback is ALWAYS sent alongside (notifications/accessibility).
- adapter.edit_message(): blocks only on finalize=True, so intermediate
streaming edits stay plain mrkdwn (no per-flush block re-derivation).
- Docs (EN + zh-Hans) + config example. Send-side only: no app reinstall.
Tests: pure-renderer unit suite + adapter integration suite (blocks present
when on, plain text when off, text fallback always set, finalize gating,
multi-chunk fallback). Prove-failed against a stubbed renderer.
Registers PowerShell (.ps1/.psm1/.psd1) in the LSP server registry,
spawning PowerShellEditorServices over stdio via a pwsh/powershell
host. PSES ships as a GitHub release zip (no npm/go/pip recipe), so it
sits in the manual install tier alongside rust-analyzer and clangd.
The spawn builder resolves the module bundle from (in order) the
lsp.servers.powershell.command override, init bundlePath, the
PSES_BUNDLE_PATH env var, or <HERMES_HOME>/lsp/PowerShellEditorServices,
then launches Start-EditorServices.ps1 -Stdio with a non-interactive,
no-profile host. hermes lsp status/list report it as manual-only until
pwsh is present.
Docs and tests included.
Adds gateway.platform_connect_timeout (default 30s) to DEFAULT_CONFIG and
bridges it to the internal HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT env var
at gateway startup, following the existing gateway_timeout config->env
pattern. The env var remains the manual-override escape hatch and wins if
set explicitly; otherwise config.yaml supplies the value. This closes the
issue's documentation/config-surface request (#19776 suggestion 2) on top
of the adapter ready-wait fix, so users no longer need an undocumented env
var to raise the Discord connect timeout.
Refs #19776
Cron pre-run scripts were capped at 120s by default, which surprised
users running long data-collection scripts on crons (the whole point of
crons being to offload long work). Raise _DEFAULT_SCRIPT_TIMEOUT to 3600s
(1 hour).
This bounds the script only — skill/agent jobs already run on a separate
inactivity budget (HERMES_CRON_TIMEOUT, default 600s idle, 0=unlimited),
not a wall-clock cap. Scripts dispatch to a persistent thread pool and do
not hold the tick lock, so a long script doesn't starve other due jobs.
Docs clarified to make the script-vs-agent timeout distinction explicit.
env/config overrides (HERMES_CRON_SCRIPT_TIMEOUT,
cron.script_timeout_seconds) unchanged and still take precedence.
Add a `pre_verify` user/plugin/shell hook fired once per turn when the agent
edited code and is about to finish, after the existing verify-on-stop guard. A
hook can keep the agent going one more turn (run a check, defer it, tidy the
diff) by returning {"action":"continue","message":...} (the Claude-Code Stop
shape {"decision":"block","reason":...} is accepted too). Hooks receive coding,
attempt, final_response, and sorted changed_paths so they can self-scope and
self-throttle; the path is bounded by agent.max_verify_nudges and preserves
message-role alternation.
Hermes still ships its default coding guidance (agent.verify_guidance, on by
default), but it now rides the evidence-based verify-on-stop missing-evidence
nudge instead of a separate default pre_verify continuation, so it costs no
extra model turn of its own. Guidance reuses the shared utils.is_truthy_value
parser rather than a local copy.
Add a generic per-platform PlatformConfig.typing_indicator flag (default
True) that gates the _keep_typing refresh loop in
_process_message_background. When false, the loop is never spawned, so no
typing/"is thinking…" status is shown on that platform — message delivery
is otherwise unchanged.
Mirrors the gateway_restart_notification contract exactly: dataclass field
+ to_dict/from_dict (with extra-fallback resolution) + shared-key bridge in
load_gateway_config, so 'slack: typing_indicator: false' under platforms
works without a separate block. Generic by design — the same key works for
every platform (Slack 'is thinking…', Telegram/Discord/Signal typing).
Motivated by users who find Slack's assistant 'is thinking…' status noisy
(it also briefly disables the compose box, via the Assistant API).
Port the two genuinely-novel ideas from Command Code's /design skill into
our existing claude-design skill (skill-only, zero model-tool footprint):
- Surface-First: commit to one of 7 surface archetypes (Monitor/Operate/
Compare/Configure/Decide/Explore/Command) before any visual tokens. Most
AI design slop is compositional, not cosmetic — conditioning generation on
a surface choice collapses entropy the way a CoT step does. Workflow step 3.
- Slop Diagnostic: the ~10 tells that account for ~90% of the 'this is AI'
signal, as a score-out-of-10 self-audit. Diagnose-then-treat: the report is
context not a to-do list; repair only what fired, matched to the tell
(re-layout vs recolor vs de-decorate). Workflow step 7 (Verify).
Did NOT clone /design's 16-mode CLI, proprietary reference corpus, or make it
a core tool. Docs page regenerated via generate-skill-docs.py.
A manually-installed venv inside the cloned repo can be destroyed by the
agent running a relative-path command against its own checkout (rm -rf venv,
uv venv venv, etc.), silently wiping the running runtime mid-session. Moving
the canonical manual-install venv to ~/.hermes/venvs/hermes-dev means no
relative path from the agent's workspace resolves to its own runtime, making
the bug class impossible without any command-detection code.
Closes the root cause of #7779. The managed install.sh layout is unchanged.
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.
The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:
- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
encode/resize/dimension-check off the caller's loop, sized to the host's
usable core count with NO fixed ceiling — the cap tracks the actual
exhausted resource (cores), not a magic number. Excess encodes queue on
the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
(honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.
Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.
Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).
It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.
Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.
Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
Group DMs (multi-person DMs, channel_type=mpim) were never delivered to
the Slack bot. The adapter already classifies mpim as a DM and replies
ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the
generated app manifest only subscribed to message.im / im:history — the
1:1 DM pair. Without the message.mpim event subscription Slack drops
group-DM messages before the adapter ever sees them, so 1:1 DMs worked
while group-DM ambient mode was dead.
Add message.mpim to bot_events and mpim:history (the scope that event
requires per Slack docs) + mpim:read (mirrors im:read for the
conversations.info classification call) to bot_scopes. Update the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs
(EN + zh-Hans: scope table, event table, troubleshooting) so existing
installs are told to add the new scopes and reinstall.
Reported by an enterprise customer. Note: this is a manifest/scope
change, so it only takes effect after the app is reinstalled and the
new scopes are accepted.
Tests: assert message.mpim + mpim:history + mpim:read are in the
manifest (with and without assistant mode); both fail on current main
and pass with this change.
The cron delivery table only showed Discord/Telegram with explicit
target syntax and described Slack and every other platform as
home-channel-only. In fact the generic platform:<target> routing in
_resolve_single_delivery_target resolves explicit targets for every
platform: Slack (#channel / channel ID / channel:thread_ts), Matrix
(room/user IDs), Feishu (chat:thread), WhatsApp (JID / E.164), Signal
(group / E.164), SMS, Email, and Weixin all have dedicated explicit-
target branches in _parse_target_ref; the remaining platforms accept a
generic platform:<chat_id> passthrough.
Update the Delivery Model table (en + zh-Hans) to show the real
per-platform syntax, document #channel name resolution via the channel
directory, and note the Slack thread_ts nuance. Docs-only.
`hermes serve` is newer than the desktop binary's release cadence, so a new
app launched against an un-upgraded managed install / PATH `hermes` would
crash on an unknown subcommand and brick the user mid-upgrade. Detect whether
the resolved runtime registers `serve` (fast source read of its dashboard.py,
with a one-time CLI probe fallback) and rewrite the backend argv to the legacy
`dashboard --no-open` only when it does not. Happy path (current runtimes)
pays nothing and still spawns `serve`.
- electron/backend-command.cjs: pure serve/dashboard argv helpers + serve-
source detection (unit-tested in backend-command.test.cjs)
- main.cjs: backendSupportsServe() cache + getBackendArgsForRuntime() guard at
both backend spawn sites; expose `root` from the Windows venv unwrap so the
fast source check covers Windows too
- docs: note the backward-compat fallback in README, desktop.md, AGENTS.md
The desktop app spawned `hermes dashboard --no-open` as its backend, which
made the dashboard look like a desktop prerequisite. Add a dedicated headless
`hermes serve` command that boots the same gateway (shared cmd_dashboard /
start_server) but never opens a browser, and point the desktop backend spawn
exclusively at it. dashboard and serve are now independent surfaces — neither
launches the other.
- subcommands/dashboard.py: factor shared server args; add `serve` parser
(always headless; accepts legacy --no-open as a no-op)
- main.py: register serve in _BUILTIN_SUBCOMMANDS + coalesce set + gui-log
detection; extend stale-backend reaper patterns to match `serve`
- desktop electron: spawn `serve`, rename dashboardArgs -> backendArgs,
update comments + windows-child-process test assertions
- docs: desktop README, desktop.md (incl. remote-backend), AGENTS.md, and
cli-commands.md now describe `hermes serve` as the desktop/headless backend
The desktop app spawns a headless `hermes dashboard --no-open` backend and
talks to it through the shared @hermes/shared WebSocket client — it never
runs or requires the browser dashboard UI. Spell this out in the desktop
README, the desktop docs page, and AGENTS.md so "dashboard" stops reading
as a desktop prerequisite.
* docs(discord): document bot-to-bot comms as unsupported (#32791)
Multi-profile bot-to-bot conversation is not a supported topology.
DISCORD_ALLOW_BOTS=none (the default) blocks all bot-originated
messages; setting mentions/all across multiple Hermes profiles to make
them reply to each other ack-loops because Discord's reply auto-mention
satisfies the mention gate every turn. Document the safe default and
the loop hazard so operators don't wire it up.
* docs(discord): infographic for bot-to-bot unsupported stance (#32791)
* docs: third-party-product plugins ship standalone, not into core tree
Generalizes the closed-set memory-provider policy to any plugin that
integrates someone else's product/project (observability backends,
vendor SaaS, analytics dashboards, paid-service tie-ins). These create
an open-ended maintenance burden on us for backends we don't own, so
they ship as standalone plugin repos installed into ~/.hermes/plugins/
and are promoted in #plugins-skills-and-skins — not merged into core.
- AGENTS.md: new 'what we don't want' bullet + generalized policy note
beside the memory-provider closed-set rule
- CONTRIBUTING.md: new 'Third-Party Product Integrations' section
- build-a-hermes-plugin.md: caution callout at the top of the guide
It's a coupling decision, not a quality bar — a plugin can clear review
and still be a close.
* docs: add infographic for standalone-plugin policy
Follow-up to the cherry-picked #29212 (#29177):
- Promote the 24h stale-process threshold to config.yaml
(session_reset.bg_process_max_age_hours) instead of a hardcoded
constant. 0 disables the cutoff (legacy: any live process blocks reset).
Wired through GatewayConfig.default_reset_policy in gateway/run.py.
- Bug 2: process(action=list) now resolves the gateway session_key from
the contextvar and surfaces session-scoped background processes (a
forgotten preview server under a different task), flagged
session_scoped — so the agent/user can discover and kill the blocker.
Previously the task-scoped list returned [] and the blocker was invisible.
- Tests: config round-trip for the new field, cross-task list visibility.
- Docs: messaging session-reset section.
The Discord adapter could enter a silent zombie state after a network
outage / proxy stall: the process is alive, _client looks open, but the
underlying socket is dead. discord.py's WebSocket reconnect never sees a
RST through a wedged proxy/NAT, so client.start() spins forever without
exiting — which means the bot-task done callback (which only fires on
task completion) never trips either. The bot stays "offline" in Discord
until a manual `hermes gateway restart`. Reported offline for 13-17h.
Adds an out-of-band REST liveness probe in DiscordAdapter. Every
`discord.liveness_interval_seconds` (default 60s) the adapter issues a
cheap fetch_user(bot_id) — the same REST path as message delivery, so it
fails when the proxy/NAT is wedged. After
`discord.liveness_failure_threshold` consecutive failures (default 3) the
probe closes the wedged client and surfaces a retryable fatal error,
which trips the gateway's existing _platform_reconnect_watcher and
rebuilds the adapter. Operators disable it by setting either knob to 0.
Config lives in config.yaml (discord.liveness_*) per the .env-is-secrets
policy; _apply_yaml_config bridges it to internal env vars the adapter
reads, matching the existing HERMES_DISCORD_TEXT_BATCH_* pattern.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
A user-added systemd drop-in like ExecStopPost=/bin/kill -9 $MAINPID fires
on every stop, including clean restarts — it SIGKILLs the freshly spawned
gateway before it stabilizes and Restart=always respawns it, producing an
infinite restart loop (issue #23272). The unit Hermes installs already shuts
down cleanly via KillMode=mixed + KillSignal=SIGTERM with Restart=always +
RestartForceExitStatus, so no extra kill is needed. Document this as a danger
callout in the gateway service-management section.
HMAC validation authenticates the webhook sender, not the business
fields inside the payload (PR titles, commit messages, issue bodies),
which are authored by untrusted third parties. Expand the prompt-
injection section to make the trust boundary explicit: the agent's
capability surface, not the input channel. Document the hardening
levers (sandbox the runtime, scope the toolset, keep approvals on,
template narrowly) instead of pretending to sanitize untrusted text.
Refs #8820.
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.
Also fixes#53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.
Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
Add post_setup() and get_status_config() to the Supermemory memory
provider so `hermes memory setup` and `hermes memory status` print a
one-line connection summary (container, profile fact count,
auto_recall/auto_capture). Point API-key onboarding at the Hermes
connect URL (app.supermemory.ai/integrations?connect=hermes).
Salvage of #52988. Two fixes folded in:
- Test isolation: the new probe/status tests mocked _SupermemoryClient
but not the __import__("supermemory") guard inside
_probe_supermemory_connection, so they passed only where the optional
supermemory package was installed and failed on a clean checkout / CI
(the PR shipped with red CI). Added _stub_supermemory_importable()
mirroring the existing test_is_available_false_when_import_missing
pattern; the suite now passes with supermemory absent.
- post_setup: `if api_key and api_key not in os.environ` checked whether
the key's *value* named an env var (always false in practice). Fixed to
compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`.
Verified: 38/38 in test_supermemory_provider.py and the full
tests/plugins/memory/ suite green with supermemory not installed.
Closes#52988
- Use os.pathsep instead of literal ':' so Windows paths (C:\dir) and
the Windows separator ';' work correctly.
- Add 9 tests covering multi-root behavior: writes inside first/second
root, writes outside all roots, trailing/leading/double separators,
all-separators edge case, static deny priority, duplicate dedup.
- Update hermes_cli/tips.py tip string to mention multiple paths.
- Update docs to mention os.pathsep / ; on Windows.
Follow-up for salvaged PR #49557.