Five follow-ups to #57659 from post-merge review:
1. install.ps1: gateway scheduled-task re-enable now runs in a finally
(a thrown Remove-Item/uv venv failure previously stranded the user's
gateway autostart disabled), and tasks that were already disabled
before the install are no longer blindly re-enabled.
2. The venv-python holder guard is no longer bypassed by plain --force
(which the desktop bootstrap passes on every update while its lock
probe only checks hermes.exe/app.asar). New explicit --force-venv is
the escape hatch; --force keeps bypassing only the hermes.exe shim
guard.
3. _detect_venv_python_processes now also catches uv/base-interpreter
trampolines whose exe is outside the venv, via cmdline (venv path or
'-m hermes_cli.main' tied to this install root) and cwd.
4. Missing venv python is now UNHEALTHY on managed installs
(.hermes-bootstrap-complete / .update-incomplete markers) so the
repair lane runs instead of 'Already up to date!'; the repair branch
recreates the venv first when it's gone entirely. Dev checkouts keep
reporting healthy.
5. install.ps1 comment no longer claims a Startup-folder disarm the
code doesn't perform (logon-only, not a mid-install respawner).
Follow-up to #57507: .ENV / .Env.local on case-insensitive filesystem
mounts slipped past the guard. Lowercase the name before matching and
add a regression test. Addresses egilewski's open review note.
Replace the exact-filename frozenset with _is_sensitive_filename()
that matches .env plus any .env.<suffix> variant. This covers
shorthand suffixes like .env.prod that the previous enumeration
missed.
Add test_sensitive_env_suffix_variants_blocked regression test
covering .env.prod, .env.dev, .env.staging.local, and .env.ci.
Addresses review feedback from egilewski on PR #57507.
The dashboard Files tab could list, read, and download .env files
containing API keys when running with a bind-mounted Hermes home
directory (e.g. docker run -v ~/.hermes:/opt/data).
Add _SENSITIVE_FILENAMES frozenset and filter these from
list_managed_files(), read_managed_file(), and download_managed_file().
Return 403 for direct read/download attempts on sensitive files.
Fixes#57505
Root-causes the July 2026 Windows incident chain (locked _brotlicffi.pyd /
_sodium.pyd during install, then 'No module named annotated_doc' with
'hermes update' insisting 'Already up to date!'):
- hermes update: probe venv core imports even when the checkout is current;
a half-updated venv (dep sync killed mid-flight by a locked .pyd) is now
detected and repaired instead of being reported as up to date
- hermes update (Windows): after pausing gateways, refuse to mutate the venv
while other processes run from the venv interpreter (the Desktop backend
runs as python.exe so the hermes.exe shim guard never saw it); --force
keeps the old behavior
- install.ps1 venv stage: disarm gateway autostart Scheduled Tasks before
the kill sweep (they respawn the gateway inside the kill->delete window),
make the sweep a bounded loop requiring 3 clean passes, and rename-then-
delete the old venv (a rename succeeds even with mapped DLLs) with stale-
dir cleanup on the next run
- desktop updater: 'venv shim still locked after 15s' now ABORTS the update
hand-off (restarting our backend, surfacing the holder to the user)
instead of 'proceeding anyway (force)' into guaranteed venv corruption;
the unlock wait also re-kills respawned backends each poll tick
Follow-up to the salvaged early-exit retry fix (#35617): the debug-browser
launch path was fire-and-forget (stderr to DEVNULL, no logging), so every
platform failure — Windows singleton forward to an existing instance, bad
profile dir, missing shared libraries, policy blocks — collapsed into the
same unactionable 'port 9222 isn't responding yet' message and debug
reports contained nothing.
- launch_chrome_debug() returns a structured ChromeDebugLaunch with
per-candidate attempts (state, exit code, stderr tail)
- browser stderr is captured to <hermes_home>/chrome-debug/launch-stderr.log
- clean exit (code 0) without the port opening is detected as Chromium's
single-instance forward and produces a targeted user hint to close all
running instances of that browser
- crash exits surface the stderr tail (e.g. missing libnspr4.so)
- every spawn/exit is logged to agent.log so hermes debug share captures it
- CLI (/browser connect) and TUI/desktop (browser.manage) both print the hint
* feat(desktop): CLI/dashboard parity — skills hub browser, MCP test/toggle/catalog, maintenance ops, log filters
Brings desktop GUI to parity with hermes skills/mcp/doctor/backup/debug-share/
curator/memory CLI commands and the dashboard's System + Skills-hub pages:
- Skills page: new Browse Hub tab (search official/GitHub/community sources,
preview SKILL.md, security scan verdicts, install/update with live action log)
- MCP settings: connection test (tool listing), per-server enable/disable
toggle, and a Catalog tab installing Nous-approved MCP servers with env prompts
- Command Center: new Maintenance section (doctor, security audit, backup,
debug share links, curator status/pause/run, memory file status + reset)
- Command Center system logs: file (agent/errors/gateway/desktop), level, and
substring filters instead of a fixed agent.log tail
- hermes.ts API client + types for all the above; en/zh locale strings (ja and
zh-hant inherit via defineLocale)
* feat(desktop): backend model catalogs in toolset config — hermes tools parity
Completes the `hermes tools` parity gap: after picking an image/video
generation backend the CLI runs a model picker (e.g. FAL's multi-model
catalog with speed/strengths/price); the desktop toolset drawer now has the
same flow as a radio-card list.
- web_server: GET /api/tools/toolsets/{name}/models (catalog + current +
default for the active or named provider row) and PUT .../model
(validated write to image_gen.model / video_gen.model), reusing the CLI's
plugin catalog helpers so GUI and `hermes tools` stay in lockstep
- desktop: ModelCatalogPicker in ToolsetConfigPanel — per-model cards with
speed/strengths/price, in-use + default badges, disabled until the
backend is the active one; provider selection now mirrors is_active
locally so the catalog unlocks without a refetch
- tests: 3 backend endpoint tests (catalog shape invariants, persist +
validation), 2 component tests, 2 API-contract tests; en/zh strings
New preset key 'fanout': 'per_iteration' (default, unchanged behavior)
re-runs the reference fan-out whenever the advisory view changes — every
tool iteration. 'user_turn' runs the advisors ONCE per user turn and lets
the aggregator act alone for the rest of the tool loop — the original MoA
shape (upfront multi-model synthesis, then a single acting model), and the
obvious lever on MoA's wall/cost multiplier (advisor generation dominates
per-turn latency).
Implementation reuses the existing turn-scoped reference cache: in
user_turn mode the cache signature hashes only the prefix up to the LAST
user message, so mid-turn advisory-view growth doesn't change the key and
iteration 2+ is a cache HIT (advice reused, zero advisor spend, no
re-trace). A new user message changes the prefix and re-triggers the
fan-out. Unknown fanout values normalize to per_iteration.
OpenCode Go serves minimax/qwen via Anthropic Messages (base URL without
/v1 — the SDK appends /v1/messages) and glm/kimi/deepseek/mimo via OpenAI
chat completions (base URL WITH /v1). The runtime stripped /v1 for
anthropic-routed models, and the TUI/desktop + gateway persisted that
stripped URL to model.base_url. Every later chat_completions model then
POSTed to https://opencode.ai/zen/go/chat/completions — a 404 (the
marketing site). Result: only minimax worked; glm/deepseek/kimi all 404ed.
- New normalize_opencode_base_url(): symmetric /v1 normalization —
strip for anthropic_messages, re-append for chat_completions /
codex_responses on opencode.ai hosts (heals persisted stripped URLs;
custom proxy overrides untouched)
- Applied at all three former one-way strip sites (resolve_runtime_provider
x2, switch_model)
- opencode_model_api_mode: all Qwen models on Go AND Zen now route via
/v1/messages per current published endpoint tables (previously only
qwen3.7-max on Go — qwen3.6-plus etc. would 404 the same way)
- Catalog refresh: Go gains deepseek-v4-pro/flash, glm-5.2,
kimi-k2.7-code, minimax-m3, qwen3.7-plus; Zen gains glm-5.2,
kimi-k2.7-code, minimax-m3, qwen3.7-plus
Reported by IndieSuperhuman on X: opencode-go 404s for any model other
than minimax.
A single-model Hermes agent never sends temperature; the provider default
applies. MoA hardcoded reference_temperature=0.6 / aggregator_temperature=0.4,
and the coercion float(preset.get(key, 0.6) or 0.6) made unset IMPOSSIBLE to
express: absent, null, empty, and even an explicit 0 all collapsed to the
baked-in default. Every MoA advisor and aggregator therefore ran at 0.6/0.4
while the same model running solo used the provider default — silently
skewing solo-vs-MoA comparisons and overriding provider-tuned defaults.
- moa_config normalization: temperatures coerce to None when absent/blank/
invalid (new _coerce_float_or_none); explicit values incl. 0 honored.
- moa_loop: _preset_temperature() resolves preset values; None flows to
call_llm, which already omits the parameter when None (same contract as
max_tokens). Aggregator still inherits the acting agent's own configured
temperature when the preset doesn't pin one.
- conversation_loop (context-mode MoA): same resolution, no more hardcoded
0.6/0.4 at the call site.
- DEFAULT_CONFIG preset + web_server payload models + docs updated: unset
is the default, pinning stays available.
hermes debug share reads os.getenv — the invoking terminal's environment — but
launchd/systemd and the desktop-spawned `serve` backend load credentials from
~/.hermes/.env, not the login shell. A key exported in the shell but absent
from .env is invisible to the backend, yet the dump printed a bare "set",
sending support down a phantom "the key is configured" path.
This was the actual trap behind a "Desktop has no web_search / no tools"
report: FIRECRAWL_API_KEY was a shell export (so `debug share` in a terminal
read "firecrawl set") but not in .env, so the launchd backend's
check_web_api_key returned False and web_search was gated off — which a
contributor then misdiagnosed as a missing `desktop` platform registration.
The dump now annotates any key set in-process but missing from ~/.hermes/.env
with "(shell only — not in .env; managed/desktop backend may not see it)" so
the mismatch is obvious instead of hidden behind "set".
Follow-up to @helix4u's #57336 salvage. Two review findings:
- W1: model-picker grouped custom-provider rows by
(api_url, credential, api_mode) but NOT extra_headers. Entries sharing a
URL+credential+api_mode yet declaring different headers (e.g. per-tenant
routing behind one proxy) collapsed into one row and probed /models with
whichever header set was seen first (order-dependent). Fold a canonical
header identity into group_key so distinct header-authed endpoints stay
separate; drops the now-dead first-non-empty merge branch.
- W2: the extra_headers stringify+None-filter comprehension existed in 5
copies (config.py x2, runtime_provider.py, model_switch.py, models.py).
Extract one shared hermes_cli.config.normalize_extra_headers primitive;
all sites now call it.
Tests: +normalize_extra_headers unit tests, +regression test proving two
same-endpoint entries with different headers stay distinct and each probes
with its own headers. 223 targeted tests pass; ruff clean.
Desktop/dashboard WebSocket connections drop during long agent operations
(delegate_task subagents, large model outputs) when the uvicorn event loop is
GIL-starved for minutes. Root cause: uvicorn's ws keepalive ping runs on the
SAME event loop as agent turns. A single synchronous GIL-holding call on a
worker thread (a regex/scrub over a large output, or a long subagent turn)
freezes the loop, so it cannot process the incoming pong within ws_ping_timeout
and uvicorn closes an otherwise-healthy connection (#53773: 'event loop stalled
226.3s'; #48445/#50005). Loosening the timeout only raises the threshold — a
multi-minute stall sails past any finite window.
The keepalive ping exists to detect half-open connections (reverse-proxy 524,
dropped tunnels), which cannot happen on loopback: there is no network or proxy
in the path, and a dead local client tears the socket down with a real FIN/RST
that starlette surfaces as WebSocketDisconnect regardless of the ping. So on
loopback the ping provides ~no liveness value while actively killing
recoverable stalls — disable it entirely (ws_ping_interval/timeout=None).
Non-loopback (public) binds sit behind a Cloudflare Tunnel where half-open IS a
real failure mode, so the ping stays at 20/20 to detect it.
Empirically verified (real uvicorn + websockets peer): with ws_ping=None the
server never closes a silent peer during an 8s window; with the pre-fix 2s/2s
window uvicorn closes it. A genuinely-dead client still fires the
WebSocketDisconnect reap path regardless of the ping.
Note: this fixes the local Desktop case (the OP's scenario). A remote Desktop
over an authenticated public dashboard route (McCalebTheSecond's comment) keeps
the ping and needs the deeper GIL-hotspot fix — tracked separately.
Closes#53773
delete_profile stopped only the process named in gateway.pid, but a Desktop
app spawns a headless `serve`/`dashboard` backend per profile that holds the
profile's SQLite connection open and keeps writing sessions/WAL/sandbox files.
That backend is never in gateway.pid, so a CLI `hermes profile delete` run
while the Desktop app is up left it writing into the tree — rmtree's final
rmdir then failed with ENOTEMPTY (#47368 "Bug 2"), and pre-guard it also
resurrected the directory.
- _profile_bound_backend_pids(): find running Hermes backends bound to this
profile via a `--profile <name>` selector or a HERMES_HOME env resolving to
the profile dir. Tightly scoped — current-user only, backend subcommands
(serve/dashboard/gateway) only so an interactive chat is never killed, and
never this process or its ancestors.
- _stop_profile_backends(): terminate them (graceful, then force), best-effort
so it can never make delete worse.
- _rmtree_with_retry(): a few spaced retries absorb the ENOTEMPTY / Windows
file-lock race from a just-terminated writer's in-flight -wal/-shm/sandbox
writes instead of failing the whole delete on a race the next attempt wins.
Complements the recreation guard (deleted profiles no longer reappear) and the
Desktop teardown-before-delete flow; this is the CLI-side convergence fix for a
delete run while a Desktop-managed backend is live.
Part of #47368.
Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.
- Poll the token endpoint with server-provided interval, honoring
slow_down and expires_in; store tokens with auth_mode
oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
rotated tokens sync back to auth.json, the global root store, and the
credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
drop the duplicate dashboard pool entry so exactly one seeded
device_code entry exists.
- Use the shared device_code source name for consistency with the
nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
--manual-paste flag from documented commands.
Named providers / custom_providers entries in config.yaml now accept an
extra_headers dict scoped to that endpoint — for reverse proxies, API
gateways, and custom auth schemes (e.g. Cloudflare Access service tokens).
- hermes_cli/config.py: normalize extra_headers on provider entries
(_normalize_custom_provider_entry + providers-dict translation), add
get_custom_provider_extra_headers /
apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on
base_url (case/trailing-slash insensitive, no substring bypass —
mirrors the TLS helpers)
- hermes_cli/runtime_provider.py: surface extra_headers in the resolved
runtime for named custom providers (providers dict, legacy
custom_providers list, and the credential-pool path)
- run_agent.py / agent/agent_init.py: merge per-provider extra_headers
onto the OpenAI client default_headers at construction and on every
_apply_client_headers_for_base_url re-application (credential swaps,
rebuilds), most-specific level wins; OpenAI-wire only (native
Anthropic/Bedrock scoped out)
- agent/auxiliary_client.py: accept model.extra_headers as an alias of
model.default_headers for the global variant
- cli-config.yaml.example: documented commented example
- Header values are treated as secrets and never logged
Salvaged from PR #3526 by @jneeee, reimplemented against current main.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Salvaged from PR #3243 by @Mibayy, reimplemented against current main
(the original diff targeted a removed gateway/run.py handler).
- /compact is now a first-class alias of /compress (CLI, gateway,
Telegram/Slack/Discord command lists, autocomplete) — also fixes the
dangling '/compact' references in gateway error messages
(gateway/run.py context-exhausted banners).
- --preview / --dry-run: report what WOULD be compressed (message
counts, token estimate, 'here [N]' boundary) without touching the
transcript. Flags coexist with the existing 'here [N]' / focus-topic
args on both the CLI and gateway surfaces via shared pure helpers in
hermes_cli/partial_compress.py.
- --aggressive (LLM-free hard truncation) is intentionally NOT
implemented: it would need its own transcript-persistence branch
outside the guarded _compress_context rotation machinery (#44794
data-loss class). The flag is recognized and returns an explanatory
message pointing at '/compress here [N]' and /undo instead of being
mis-parsed as a focus topic.
- locales: gateway.compress.aggressive_unsupported added to all 16
catalogs (parity test enforced).
- release.py: AUTHOR_MAP entry for contributor credit.
Salvage of #3459 by @keslerm, reimplemented against the restructured
progress-callback block in gateway/run.py (resolve_display_setting,
needs_progress_queue, thinking-relay). Duplicate PR #3458 by @dlkakbs was
submitted 4 minutes earlier with the same feature — both credited.
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
tool_progress: log keeps the chat silent and appends timestamped tool-call
lines to ~/.hermes/logs/tool_calls.log via a dedicated queue drained by an
async writer (RotatingFileHandler 5MB x 3, RedactingFormatter so secrets
never land on disk). Gateway-only by design; thinking_progress relaying and
the webhook gate are unaffected. /verbose now cycles
off -> new -> all -> verbose -> log.
Salvage of the surviving hunk of #3296 by @Mibayy. The PR's gateway
_handle_provider_command hunk targets code removed on main (/provider was
absorbed into /model + /status, which already read model.base_url); the
hermes status mislabel was the remaining live symptom:
_effective_provider_label() only checked the legacy OPENAI_BASE_URL env var,
so a custom endpoint configured canonically in config.yaml still displayed
as OpenRouter.
delegation.max_concurrent_children is now the single cap for both a
batch's parallelism and concurrent background delegation units.
- _get_max_async_children() delegates to _get_max_concurrent_children();
a leftover max_async_children key logs a one-time deprecation warning
- config v32→33 migration removes the stale key, folding a raised
max_async_children into max_concurrent_children (max wins, no lost
headroom)
- capacity error messages now point at max_concurrent_children
- pool-at-capacity sync fallback now attaches an explanatory note so
the model/user know why the call blocked instead of dispatching async
Previously users who raised max_concurrent_children (e.g. to 15) still
hit the invisible default-3 async cap: the 4th background delegate_task
silently ran inline, blocking the turn with no signal.
Two related hardening fixes for auxiliary calls (which include MoA reference
advisors — a pinned-model path where provider fallback is not a meaningful
recovery):
1. Transient-transport retries: the same-provider retry on a connection reset /
timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux
call a second blip silently loses the call (root of the run2 double-advisor
'Connection error' collapse — a genuine upstream blip). Now retries N times
with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3
total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out
preserved.
2. Per-model client-cache isolation: _client_cache_key excluded the model, so
two concurrent auxiliary calls to the same provider/base_url/key but
different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry
and could race each other's client lifecycle. Model now participates in the
key -> distinct clients, no cross-call races. Same-model reuse unchanged.
- agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in
_client_cache_key and both call sites.
- hermes_cli/config.py: auxiliary.transient_retries default (2).
- tests: new retry/isolation tests; updated 2 stale-expectation tests to the
corrected behavior (per-model resolve; N-retry escalation).
Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.
_get_platform_tools reverse-maps a platform composite to configurable
toolsets with an all-tools subset test. Because get_toolset() merges
registry-registered tools into a toolset, a tool added to a toolset
(delegate_cli -> delegation; desktop-only read_terminal -> terminal) that the
static composite never listed made the subset test fail, silently dropping the
entire toolset on api_server and other inference-based platforms. Compare the
toolset's static membership at all three reverse-map sites.
Fixes#49622.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.
Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).
E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.
- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
Ben caught that the initial approach (widening _NOUS_PORTAL_ALLOWED_HOSTS to
include the staging host) was the wrong fix -- env vars are supposed to
override the allowlist, mirroring how NOUS_INFERENCE_BASE_URL already
bypasses _ALLOWED_NOUS_INFERENCE_HOSTS via _nous_inference_env_override().
The actual bug: both resolve_nous_access_token and
resolve_nous_runtime_credentials read
`_optional_base_url(state.get("portal_base_url")) or os.getenv(...) or ...`
-- a plain `or` chain where the STORED state value wins first (short-circuits
before the env vars are even read), and then whichever value won gets run
through the same _NOUS_PORTAL_ALLOWED_HOSTS gate regardless of its source.
So a hosted agent stamped with HERMES_PORTAL_BASE_URL=<staging> in its env
AND a staging portal_base_url already persisted to auth.json would still
get silently rewritten to prod on every refresh, because the env var never
even got a chance to be consulted.
Revert the previous _NOUS_PORTAL_ALLOWED_HOSTS widening entirely --
staying prod-only preserves the allowlist's actual job (rejecting an
untrusted network-provided portal_base_url persisted to auth.json by a
compromised Portal response).
Add _nous_portal_env_override() (mirrors _nous_inference_env_override())
and restructure both call sites so the env override is checked FIRST and,
when set, wins outright and skips the allowlist gate entirely -- the
allowlist only ever runs against the fallback (stored-state-or-default)
path now.
Rewrote tests/hermes_cli/test_nous_portal_staging_allowlist.py to test the
actual fix: the helper function, and an end-to-end
resolve_nous_access_token proof that the env override wins even when state
ALSO has the staging host stored (the exact incident shape), that it wins
over a stored PROD host too, and that the allowlist's heal-to-prod
behaviour for an untrusted stored value is preserved when no override is
set.
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:
- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
the main-client TLS resolution (via load_config_readonly, the read-only
fast path) so compression/vision/web_extract/title-gen/session_search
honor the same per-provider CA. Without this, chat worked against a
private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
ssl_verify edits are honored on mid-session model switch — matching the
context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
(httpx ignores it there); verify lives on the transport. Fix docstrings.
Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
match, and prefix-bypass rejection.
Wire ssl_ca_cert and ssl_verify through custom_providers config and env
vars into the keepalive httpx client, fixing APIConnectionError against
mkcert/self-signed Ollama proxies behind HTTPS.
In the interactive CLI, /journey dispatched straight to `args.func(args)`,
letting Rich write ANSI to stdout — which patch_stdout's StdoutProxy passes
through as literal `?[38;2;…m` garbage. Route the read-only views (default +
`list`) through a captured, force-color Console and re-emit via `_cprint`
(prompt_toolkit's ANSI parser), matching the `ChatConsole` idiom.
`delete`/`edit` stay on real stdio since they prompt / open `$EDITOR`.
- claude-fable-5 placed above claude-opus-4.8 in both curated lists
- claude-sonnet-5 replaces claude-sonnet-4.6
- sakana/fugu-ultra added near the bottom (before routers/free tier)
- regenerated website/static/api/model-catalog.json via scripts/build_model_catalog.py (live-pulled by CLI, published on merge — no release needed)
Two Hermes bots sharing a channel could volley replies at each other
indefinitely. Root cause: Discord reply-pings (allowed_mentions
replied_user=true) add the replied-to bot to message.mentions without a
literal <@bot> token in the body, so the existing bot-admission gate
treated a reply chip as an explicit @mention and re-triggered the peer.
Adds opt-in discord.bots_require_inline_mention (default false; env
DISCORD_BOTS_REQUIRE_INLINE_MENTION). When enabled, bot-authored
messages must carry a raw inline <@id>/<@!id> mention in the content;
reply-ping-only mentions no longer admit the message. Human messages and
all existing defaults are unchanged.
The new _self_is_raw_mentioned helper deliberately ignores the resolved
message.mentions list (which reply-ping populates) and checks only the
raw content token via the shared _raw_mentioned_user_ids primitive.
Self-review follow-up (hermes-pr-review Phase 2, non-blocking clarity findings).
- Collapse the reapplying_enable predicate to a single chained comparison
(new_value == current == "codex_app_server") instead of a two-clause AND
that re-tested new_value == current.
- Dedent the msg_lines list literals (drop trailing single-element commas).
No behavior change: reapply still falls through to the idempotent migrate()
while skipping set_runtime/persist (prompt cache preserved), and the auto-disable
early-return is unchanged. 31/31 tests green.
The /codex-runtime slash command short-circuits with "openai_runtime
already set" when invoked with the same value as the current config,
and crucially skips the entire migration block below. The check
conflates two things: (a) "the config value is correct" and (b) "the
world state (managed block in ~/.codex/config.toml, hermes-tools MCP
callback, plugin discovery) is converged".
Common footgun this exposes: a user who pre-sets
`model.openai_runtime: codex_app_server` directly in config.yaml
(reasonable thing to do) and then runs /codex-runtime codex_app_server
to trigger migration sees "already set" and silently gets no migration.
~/.codex/config.toml never receives the managed block, the hermes-tools
MCP callback never registers, and codex falls through to its default
runtime instead of the app-server one — visibly successful but
functionally partial setup.
The migration is idempotent by design (it replaces its own managed
block in place between MIGRATION_MARKER and MIGRATION_END_MARKER), so
re-running it is safe and cheap. Fix the short-circuit to fall through
to migration when re-applying codex_app_server while skipping the
config persist (no value-level change needed). The disable case
(re-applying "auto") still short-circuits because disabling doesn't
touch ~/.codex/config.toml at all.
The user-visible message changes to "openai_runtime already set to
codex_app_server — re-applying migration" so re-runs surface what
happened.
Regression test (test_reapply_codex_app_server_runs_migration) asserts:
- migrate() was called when re-applying
- persist_callback was NOT called (no config write on no-op transitions)
- migration output (MCP servers, sandbox default) surfaces in the
user-visible message
- requires_new_session is True so callers know to /reset
Verified RED→GREEN: the test fails on origin/main with
"migration must run on reapply, not just first enable" and passes with
this fix. Full test_codex_runtime_switch.py suite: 31 passed.
Two independent MoA auxiliary-call fixes:
#53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout
were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long
reference/aggregator turn (mixed providers, deep reasoning, long tool chains)
has headroom instead of being cut mid-generation.
#53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by
the MoA acting-aggregator, compression, web_extract, session_search, etc.)
never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses
transport (agent/transports/codex.py) was warm. Derive the same
content-addressed key via the shared _content_cache_key(instructions, tools)
helper and set it on the aux Responses request, with the same host guards the
main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out
of cache-key routing).
Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical
prefix, differs on different instructions, skipped for xai/github hosts).
tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py
130 pass.
The provider-parity contract (tests/hermes_cli/test_provider_parity.py)
requires every hermes model provider to be configurable in the desktop
Providers tabs. Vertex authenticates via OAuth2 (service-account JSON /
ADC) and has no api_key_env_vars, so — like bedrock's aws_sdk — it needs
its credential env var tagged to the provider card explicitly. Tag
VERTEX_CREDENTIALS_PATH to the vertex card in _catalog_provider_env_metadata().
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).
- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
(5-min margin), ADC->service-account fallback, global vs regional
endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
credentials-file path is never mistaken for a static API key; mints a
fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
re-mints the token and rebuilds the client on a mid-session 401, so a
long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
runtime resolution.
Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
- Track auth store source path on Nous state reads and write rotated
OAuth refresh tokens back to the same store, preventing stale-token
replays when Hermes falls back to a global/root auth.json.
- Skip Nous fallback entries locally when no access/refresh token is
present, suppressing repeated failed resolution attempts within a
session.
- Sync session model metadata after fallback switches so the gateway
DB reflects the backend that actually served the latest turn.
Add policy gates and output redaction for browser/CDP surfaces, strengthen session ownership tracking, and block credential-like query parameters before third-party browser/web backends receive URLs.
Inspired by the agbrowse review: keep local browser magic-link flows possible while preventing cloud reader/browser escalation from receiving opaque token, code, signature, or key query parameters.
The salvaged PR guarded only resolve_nous_access_token; the primary
resolve_nous_runtime_credentials path also POSTs the refresh token to
portal_base_url on refresh with no allowlist check. Mirror the guard
there so a poisoned host can't receive the bearer, and drop the stray
duplicated allowlist comment. Adds a sibling-site regression test.
generate_systemd_unit() and generate_launchd_plist() used
Path(shutil.which('node')).resolve().parent to find the node bin dir.
When ~/.local/bin/node is a symlink into a specific profile's node
install (e.g. ~/.hermes/profiles/<p>/node/bin/node), .resolve() chases
it and bakes that one profile's path into EVERY profile's service
definition.
This breaks profile isolation and makes systemd_unit_is_current()
perpetually False: each gateway rewrites its unit + daemon-reload on
every boot, destabilizing multi-profile setups into a ~5-minute restart
loop (observed NRestarts ~1600 across two gateways).
Fix: use Path(resolved_node).parent — the directory where node is found
on PATH — instead of chasing the symlink to its resolved target. This
keeps generated service definitions profile-agnostic.
Affects both the systemd (Linux) and launchd (macOS) unit generators.
compress_context() and /new already flush un-persisted messages before
calling end_session() (fixed in #47202), but /resume and /branch still
call end_session() directly. When a turn is interrupted mid-flight and
the user immediately runs /resume or /branch, messages generated during
that turn have not yet been written to state.db and are silently lost on
session rotation.
Add the same best-effort _flush_messages_to_session_db() call before
end_session() in both _handle_resume_command and _handle_branch_command,
mirroring the pattern established in cli.py:new_session().
Regression tests verify the flush is called when an agent is present.
Reworks @valenteff's #53277 fix per review (Teknium's 3 findings):
- Route refresh_launchd_plist_if_needed's bootstrap through the existing
_launchctl_bootstrap() EIO-recovery helper (canonical since #56256),
wrapped in a wall-clock retry loop, instead of an ad-hoc 5x2s loop.
- Window sized to agent.restart_drain_timeout (default 180s), not a fixed
~10s: the failure happens while the old gateway is still draining (finding 1).
- Retry on subprocess.TimeoutExpired too, not just CalledProcessError — a
bootstrap timeout after bootout otherwise escapes and leaves the service
unloaded (finding 2).
- Confirm success with launchctl list, not a bare bootstrap exit 0 (finding 3);
mirror verify+drain-window in the detached-helper bash path.
- Shared helpers _launchd_reload_log_path / _append_launchd_reload_log /
_launchctl_label_registered / _retry_launchctl_bootstrap_until_registered.
3 new tests cover retry-until-listed, TimeoutExpired-retried, deadline-exhaust.
E2E: real reload log + mocked launchctl — retries CalledProcessError+TimeoutExpired,
verifies via launchctl list, logs failures.
refresh_launchd_plist_if_needed ran `launchctl bootout` then
`launchctl bootstrap` with errors silenced (`2>/dev/null` in the
detached helper, `check=False` in the direct subprocess path).
Under high load or a launchd race, the bootout succeeds — removing
the service from launchd — but the follow-up bootstrap fails
silently. The service stays unregistered; KeepAlive can't revive
a service launchd no longer knows about, so the gateway stays dark
until a manual `launchctl bootstrap`.
Observed incident (2026-06-26): `/restart` in chat triggered a
planned drain; during the drain a separate call re-triggered the
plist refresh, which bootout'd the live service. Under loadavg
9.48 the bootstrap failed silently — 2h35min offline until manual
recovery.
Fix: retry the bootstrap up to 5 times with 2s back-off, verify
with `launchctl list <label>` afterwards, and log failures to
~/.hermes/logs/launchd-reload.log so the health watchdog can
detect a persistent orphan. Mirrors the contract across both
the detached helper (refresh inside gateway tree) and the direct
subprocess path (refresh from external CLI).
Existing tests pass:
- test_refresh_defers_reload_when_running_inside_gateway_tree
- test_refresh_uses_direct_reload_when_not_inside_gateway_tree
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>