Five follow-ups to #57659 from post-merge review:
1. install.ps1: gateway scheduled-task re-enable now runs in a finally
(a thrown Remove-Item/uv venv failure previously stranded the user's
gateway autostart disabled), and tasks that were already disabled
before the install are no longer blindly re-enabled.
2. The venv-python holder guard is no longer bypassed by plain --force
(which the desktop bootstrap passes on every update while its lock
probe only checks hermes.exe/app.asar). New explicit --force-venv is
the escape hatch; --force keeps bypassing only the hermes.exe shim
guard.
3. _detect_venv_python_processes now also catches uv/base-interpreter
trampolines whose exe is outside the venv, via cmdline (venv path or
'-m hermes_cli.main' tied to this install root) and cwd.
4. Missing venv python is now UNHEALTHY on managed installs
(.hermes-bootstrap-complete / .update-incomplete markers) so the
repair lane runs instead of 'Already up to date!'; the repair branch
recreates the venv first when it's gone entirely. Dev checkouts keep
reporting healthy.
5. install.ps1 comment no longer claims a Startup-folder disarm the
code doesn't perform (logon-only, not a mid-install respawner).
Root-causes the July 2026 Windows incident chain (locked _brotlicffi.pyd /
_sodium.pyd during install, then 'No module named annotated_doc' with
'hermes update' insisting 'Already up to date!'):
- hermes update: probe venv core imports even when the checkout is current;
a half-updated venv (dep sync killed mid-flight by a locked .pyd) is now
detected and repaired instead of being reported as up to date
- hermes update (Windows): after pausing gateways, refuse to mutate the venv
while other processes run from the venv interpreter (the Desktop backend
runs as python.exe so the hermes.exe shim guard never saw it); --force
keeps the old behavior
- install.ps1 venv stage: disarm gateway autostart Scheduled Tasks before
the kill sweep (they respawn the gateway inside the kill->delete window),
make the sweep a bounded loop requiring 3 clean passes, and rename-then-
delete the old venv (a rename succeeds even with mapped DLLs) with stale-
dir cleanup on the next run
- desktop updater: 'venv shim still locked after 15s' now ABORTS the update
hand-off (restarting our backend, surfacing the holder to the user)
instead of 'proceeding anyway (force)' into guaranteed venv corruption;
the unlock wait also re-kills respawned backends each poll tick
Per-session /model overrides (_session_model_overrides) were in-memory only,
so a gateway restart silently reverted every session to the global default
model. Persist the non-secret parts (model/provider/base_url ONLY — never
api_key) into the session entry in sessions.json and lazily rehydrate them
on first use after a restart, re-resolving credentials through the normal
runtime provider resolution.
- gateway/session.py: SessionEntry.model_override field with
sanitize_model_override() (allowlist: model/provider/base_url) applied on
both serialization and deserialization; SessionStore.set_model_override /
get_model_override accessors. reset_session() already creates a fresh entry,
so /new keeps its clear-on-reset semantics — a restart cannot resurrect an
override the user reset away.
- gateway/slash_commands.py: write-through at both /model set sites (text
command + picker) after storing the in-memory override.
- gateway/run.py: _rehydrate_session_model_override() called from
_resolve_session_agent_runtime(); in-memory state always wins, credentials
are re-resolved per provider (credential-less fallback on failure). Session
expiry finalization also drops the persisted override.
- tests/gateway/test_session_model_override_persistence.py: restart
round-trip, /new clearing, api_key-never-serialized (including tampered
sessions.json), rehydration + live-state precedence + credential-failure
degradation.
Salvaged from #3659 by @Git-on-my-level, narrowed to the restart-persistence
gap confirmed in triage.
Named providers / custom_providers entries in config.yaml now accept an
extra_headers dict scoped to that endpoint — for reverse proxies, API
gateways, and custom auth schemes (e.g. Cloudflare Access service tokens).
- hermes_cli/config.py: normalize extra_headers on provider entries
(_normalize_custom_provider_entry + providers-dict translation), add
get_custom_provider_extra_headers /
apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on
base_url (case/trailing-slash insensitive, no substring bypass —
mirrors the TLS helpers)
- hermes_cli/runtime_provider.py: surface extra_headers in the resolved
runtime for named custom providers (providers dict, legacy
custom_providers list, and the credential-pool path)
- run_agent.py / agent/agent_init.py: merge per-provider extra_headers
onto the OpenAI client default_headers at construction and on every
_apply_client_headers_for_base_url re-application (credential swaps,
rebuilds), most-specific level wins; OpenAI-wire only (native
Anthropic/Bedrock scoped out)
- agent/auxiliary_client.py: accept model.extra_headers as an alias of
model.default_headers for the global variant
- cli-config.yaml.example: documented commented example
- Header values are treated as secrets and never logged
Salvaged from PR #3526 by @jneeee, reimplemented against current main.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Adds a no-code routing layer to the OpenAI-compatible API server so one
Hermes deployment can map different API clients to different
model/provider backends. Clients pick a backend by sending a configured
alias as the OpenAI 'model' field; unmatched values fall back to the
global model. Configured aliases are listed by GET /v1/models.
Precedence (highest first): session /model override > model_routes
route > global config. Route provider credentials resolve through
_resolve_runtime_agent_kwargs_for_provider (same seam as
channel_overrides); per-route api_key/base_url are upstream provider
credential overrides — never caller auth, never logged.
Salvaged and rebased from PR #3176 by @Mibayy onto current main.
Salvaged from PR #3243 by @Mibayy, reimplemented against current main
(the original diff targeted a removed gateway/run.py handler).
- /compact is now a first-class alias of /compress (CLI, gateway,
Telegram/Slack/Discord command lists, autocomplete) — also fixes the
dangling '/compact' references in gateway error messages
(gateway/run.py context-exhausted banners).
- --preview / --dry-run: report what WOULD be compressed (message
counts, token estimate, 'here [N]' boundary) without touching the
transcript. Flags coexist with the existing 'here [N]' / focus-topic
args on both the CLI and gateway surfaces via shared pure helpers in
hermes_cli/partial_compress.py.
- --aggressive (LLM-free hard truncation) is intentionally NOT
implemented: it would need its own transcript-persistence branch
outside the guarded _compress_context rotation machinery (#44794
data-loss class). The flag is recognized and returns an explanatory
message pointing at '/compress here [N]' and /undo instead of being
mis-parsed as a focus topic.
- locales: gateway.compress.aggressive_unsupported added to all 16
catalogs (parity test enforced).
- release.py: AUTHOR_MAP entry for contributor credit.
Salvage of #3459 by @keslerm, reimplemented against the restructured
progress-callback block in gateway/run.py (resolve_display_setting,
needs_progress_queue, thinking-relay). Duplicate PR #3458 by @dlkakbs was
submitted 4 minutes earlier with the same feature — both credited.
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
tool_progress: log keeps the chat silent and appends timestamped tool-call
lines to ~/.hermes/logs/tool_calls.log via a dedicated queue drained by an
async writer (RotatingFileHandler 5MB x 3, RedactingFormatter so secrets
never land on disk). Gateway-only by design; thinking_progress relaying and
the webhook gate are unaffected. /verbose now cycles
off -> new -> all -> verbose -> log.
Salvage of the surviving hunk of #3296 by @Mibayy. The PR's gateway
_handle_provider_command hunk targets code removed on main (/provider was
absorbed into /model + /status, which already read model.base_url); the
hermes status mislabel was the remaining live symptom:
_effective_provider_label() only checked the legacy OPENAI_BASE_URL env var,
so a custom endpoint configured canonically in config.yaml still displayed
as OpenRouter.
WhatsApp has migrated to Linked Identity Device (LID) format for user
IDs (e.g. 244645917392975@lid instead of 18505551234@s.whatsapp.net).
The bridge already resolves LIDs to phone numbers for its own allowlist
check via buildLidMap(), but the senderId field in the message payload
sent to the gateway still contained the raw LID. This caused the
gateway's WHATSAPP_ALLOWED_USERS check to reject all messages as
unauthorized, since the LID numbers don't match the phone numbers in
the allowlist.
Fix: resolve LID → phone in the senderId, senderName, and chatName
fields of the event payload before sending to the gateway, using the
existing lidToPhone mapping.
Adds the AUTHOR_MAP entry for CrazyBoyM (ai-lab@foxmail.com) so the
contributor-attribution CI check passes when PR #55828's commits are
rebase-merged with authorship preserved.
Salvage of the surviving piece of #2696 by @tarunravi. The PR's other two
changes (tool progress streaming, SSE None-sentinel fix) were independently
superseded on main by the structured hermes.tool.progress SSE events and the
rewritten queue-drain loop.
Remote OpenAI-compatible frontends can't read server-local file paths, so
MEDIA:<path> tags (browser screenshots, generated images) were dead text.
_resolve_media_to_data_urls() now inlines small (<=5MB) local images as
markdown data URLs across all four response surfaces: chat completions
(non-streaming), session chat, session chat stream final event, and the
Responses API. Non-image, missing, or oversized paths pass through
untouched.
Salvage of #2794 by @CharmingGroot, ported to the relocated
plugins/platforms/email/adapter.py:
- Guard raw_email = msg_data[0][1] against IndexError/TypeError and
non-bytes payloads. UIDs are added to _seen_uids before fetch, so an
exception mid-batch permanently skipped every remaining message in
the batch — now the bad message is logged and skipped instead.
- Message-ID domain generation falls back to 'localhost' when
EMAIL_ADDRESS lacks '@' (now via a shared _message_id_domain() helper
covering all 3 send paths; the PR fixed 2 of 3).
Attribution audit gate: the salvaged contributor commits carry
kiljadn@gmail.com (Nick Mason / @designnotdrum). Add the mapping so
contributor_audit.py resolves the author on this PR.
Map the two contributor emails whose commits are cherry-picked into the
compression-routing-integrity salvage so scripts/contributor_audit.py
attributes them at release time:
- jvsantos.cunha@gmail.com -> plcunha (PR #55300)
- jakepresent1@gmail.com -> jakepresent (PR #55721)
r266-tech (PR #50517) is already mapped.
With the default busy_input_mode=interrupt, a burst of rapid gateway
messages arriving while context compression is in flight could interrupt
the current turn and start a fresh turn against the pre-rotation parent
session. Because compression is interrupt-immune (#23975), the still-
running compression later rotates the id out from under that new turn,
and if the new turn also grew past the compression threshold it started
its own uncancellable compression on the same stale parent — forking
multiple orphaned one-shot sibling continuations (#56391).
While a state.db compression lock is held for the session, demote
'interrupt' busy-input mode to 'queue' semantics (mirroring the subagent
protection in #30170), so the follow-up message waits for the in-flight
compression + its id rotation to land instead of racing a new turn
against the stale parent. Ack copy explains the compression demotion.
Fixes#56391.
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).
- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
(5-min margin), ADC->service-account fallback, global vs regional
endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
credentials-file path is never mistaken for a static API key; mints a
fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
re-mints the token and rebuilds the client on a mid-session 401, so a
long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
runtime resolution.
Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
- Correct the exit-75 comment: Hermes-generated units set
StartLimitIntervalSec=0 (rate limiting disabled), so StartLimitBurst
does not bound loops. The real bound is that genuine crashes exit
non-zero-but-not-75, and RestartForceExitStatus=75 only whitelists
the planned code.
- Add randomuser2026x AUTHOR_MAP entry (CI blocks unmapped emails).
Wrap each Telegram initialize() attempt in asyncio.wait_for(HERMES_TELEGRAM_INIT_TIMEOUT,
default 30s). When api.telegram.org and all fallback IPs are unreachable, the connect
chain has no outer bound, so a single initialize() blocks for minutes and the
retry-on-exception loop never fires — the gateway appears to hang after the banner.
The timeout guarantees each attempt is bounded, then retries with backoff, then fails
with an actionable error. Also adds WARNING-level progress logs before DoH discovery
and each connect attempt (visible at default log level).
Salvaged onto plugins/platforms/telegram/adapter.py (Telegram moved from
gateway/platforms/ since the PR was opened). Adds env var to docs + AUTHOR_MAP.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
Follow-up on the salvaged #49830 hardening. The contributor's sensitive
query-param set included bare English words (code, key, auth, session,
sig) that double as ordinary page facets — ?code= on promo/challenge
pages, ?key= as a search facet, ?session= on blogs — so web_extract and
cloud browser_navigate would refuse a large slice of normal browsing.
Narrow the set to unambiguously credential-named params (access_token,
authorization, client_secret, password, token, x-amz-signature, ...).
Prefix-based vendor-key redaction (is_safe_url) still catches recognizable
key shapes; this set is the belt-and-suspenders for opaque secrets carried
under an explicit credential-named parameter.
Also fixes two intra-PR-staleness test breakages surfaced by salvaging onto
current main:
- web_extract_tool() no longer accepts use_llm_processing= (signature
changed since the PR was authored) — dropped the invalid kwarg.
- agent.redact now fully masks keyed 'token=<secret>' to 'token=***'
instead of partial 'sk-...'; the console-redaction test now asserts the
real invariant (secret body gone) rather than the exact mask format.
Added a regression test that generic English-word query params are NOT
blocked by the credential guard.
Add tmp_path symlink regression tests for both generate_systemd_unit and
generate_launchd_plist (~/.local/bin/node -> profile node install must not
leak the profile target into the generated unit PATH). Register
jearnest11's AUTHOR_MAP entry for the salvage cherry-pick.
Register the Matrix room-message, reaction, and invite handlers with
mautrix's wait_sync=True. mautrix's handle_sync() only returns the tasks
for handlers registered as sync-awaited; non-waited handlers are
fire-and-forget via background_task.create() and are NOT returned. Since
_dispatch_sync() awaits only the returned tasks (await asyncio.gather),
the inbound handlers previously had no completion point, so Tuwunel/
mautrix homeservers connected and completed initial sync but dispatched
zero inbound messages.
Fixes#46142.
Co-authored-by: Zeheng Huang <153708448+hunjaiboy@users.noreply.github.com>
Think-enabled models (MiniMax M2.7, DeepSeek, etc.) emit inline
<think>...</think> reasoning even for simple prompts like title
generation, and the raw XML was leaking into session titles. Route the
title-model response through the canonical strip_think_blocks scrubber
before cleanup so every tag variant — closed pairs, unterminated blocks,
orphan closes, mixed case — is handled, not just a single literal
<think> pair.
- 2 regression tests: closed <think> pair stripped, unterminated block
at start yields no title.
Salvaged from PR #44126 by @shawchanshek.
Maps the two plain-email contributors whose PRs are being salvaged so
contributor_audit.py passes:
- info@djimit.nl -> djimit (PR #48034)
- lubos@komfi.health -> lubosxyz (PR #49225)
The other two PRs in the batch (#50405 sasquatch9818, #48764 srojk34)
use users.noreply.github.com emails, which check-attribution auto-skips.
Slack Workflow Builder posts (and other app/bot messages) arrive as
subtype=bot_message with user=None. _is_user_authorized rejected them at
the `if not user_id: return False` guard, which runs *before* the #4466
{PLATFORM}_ALLOW_BOTS bypass — so @mentioning the bot from a Slack
workflow silently did nothing, even with SLACK_ALLOW_BOTS (or
SLACK_ALLOW_ALL_USERS) set. The chat-scoped allowlist for Telegram/QQ
already runs before that guard for the same reason (channel broadcasts
with no from_user); Slack was both missing from the bot-bypass map and
had the bypass running too late.
- gateway/authz_mixin: move the {PLATFORM}_ALLOW_BOTS bypass ahead of the
no-user-id guard and add Platform.SLACK -> SLACK_ALLOW_BOTS.
- plugins/platforms/slack/adapter: set is_bot=True on inbound
bot_message events so the gateway can identify workflow/app senders
(they carry no user_id to match against the allowlist).
Tested: new tests/gateway/test_slack_bot_auth_bypass.py plus the existing
Discord/Feishu bot-auth and gateway authz/gating suites all pass.
Follow-up on the salvaged resume_pending fix: the empty-turn safety net
now emits the same reason-aware recovery note as the _is_resume_pending
branch (reason phrase + 'session restored' guidance + no-re-execute
instruction) instead of a second, differently-worded note. Also adds the
AUTHOR_MAP entry for the salvaged commit.
The standalone thread-pool fallback in _deliver_result() runs inside the
`except RuntimeError:` block (taken when asyncio.run() sees a running loop).
When future.result() raised there (SMTP ConnectionError, timeout, etc.), the
exception was NOT caught by the sibling `except Exception:` — it escaped
_deliver_result() and crashed the whole delivery loop, silently skipping every
remaining target. Multi-target delivery (e.g. deliver: 'email:a,email:b') is a
documented feature, so this broke a promised contract.
Wrap the fallback in its own try/except so a per-target failure is logged with
exc_info and the loop continues to the next target.
Fixes#47163