The /codex-runtime slash command short-circuits with "openai_runtime
already set" when invoked with the same value as the current config,
and crucially skips the entire migration block below. The check
conflates two things: (a) "the config value is correct" and (b) "the
world state (managed block in ~/.codex/config.toml, hermes-tools MCP
callback, plugin discovery) is converged".
Common footgun this exposes: a user who pre-sets
`model.openai_runtime: codex_app_server` directly in config.yaml
(reasonable thing to do) and then runs /codex-runtime codex_app_server
to trigger migration sees "already set" and silently gets no migration.
~/.codex/config.toml never receives the managed block, the hermes-tools
MCP callback never registers, and codex falls through to its default
runtime instead of the app-server one — visibly successful but
functionally partial setup.
The migration is idempotent by design (it replaces its own managed
block in place between MIGRATION_MARKER and MIGRATION_END_MARKER), so
re-running it is safe and cheap. Fix the short-circuit to fall through
to migration when re-applying codex_app_server while skipping the
config persist (no value-level change needed). The disable case
(re-applying "auto") still short-circuits because disabling doesn't
touch ~/.codex/config.toml at all.
The user-visible message changes to "openai_runtime already set to
codex_app_server — re-applying migration" so re-runs surface what
happened.
Regression test (test_reapply_codex_app_server_runs_migration) asserts:
- migrate() was called when re-applying
- persist_callback was NOT called (no config write on no-op transitions)
- migration output (MCP servers, sandbox default) surfaces in the
user-visible message
- requires_new_session is True so callers know to /reset
Verified RED→GREEN: the test fails on origin/main with
"migration must run on reapply, not just first enable" and passes with
this fix. Full test_codex_runtime_switch.py suite: 31 passed.
Self-review follow-up on the salvaged approval-routing fix.
The initial adaptation re-read os.getenv("HERMES_YOLO_MODE") at session-build
time. That diverges from the repo's security invariant: HERMES_YOLO_MODE is
frozen into tools.approval._YOLO_MODE_FROZEN at import time precisely so a skill
running mid-process cannot set the env var and instantly flip the approval
bypass (a prompt-injection escalation path). A live re-read re-opened that hole
for the codex routing path.
- Add tools.approval.is_approval_bypass_active() — the canonical three-source
bypass check (frozen --yolo/HERMES_YOLO_MODE + session /yolo + approvals.mode
off) in one place. This is the 4th inline copy of that OR-chain (the three
sites in approval.py and tui_gateway/server.py:3121 all use the same idiom);
the helper is the shared chokepoint they can collapse onto.
- codex_runtime.py now calls is_approval_bypass_active() instead of the
hand-rolled mode-or-session check plus a runtime env re-read.
- Update the env-yolo test to patch _YOLO_MODE_FROZEN (the canonical test
pattern, e.g. tests/tools/test_yolo_mode.py) rather than setenv, which is
dead-on-arrival against the frozen constant.
Fail-closed default preserved on every branch; 28 integration + 77 session/yolo
tests pass; E2E confirms the real exec decision flips decline->accept only when
bypass is active.
On gateway/cron/non-CLI contexts the codex app-server runtime has no UI to
surface codex's exec/apply_patch approval requests, so they fail closed
(silently decline) — the bot appears responsive but cannot write files, with
no approval prompt anywhere ("patch rejected by user").
When the user has explicitly opted out of Hermes approvals (approvals.mode: off,
the /yolo session toggle, or HERMES_YOLO_MODE=1), collapse to codex's own
sandbox permission profile (~/.codex/config.toml) as the policy gate by passing
_ServerRequestRouting(auto_approve_exec=True, auto_approve_apply_patch=True) to
the session. Defaults (manual/smart/unset) preserve the current fail-closed
behavior — a no-op for users who have not opted out.
Reads the mode via the canonical tools.approval._get_approval_mode() (which
already normalizes the YAML-1.1 bare-'off'->False case) at session-build time,
so a mid-session /yolo toggle is honored too.
5 integration tests: each opt-out mechanism (config off, YAML False, env var,
session yolo) plus the default fail-closed regression guard.
Closes#26530
Co-authored-by: snav <jake@nousresearch.com>
The MoA preset section in the composer model dropdown presented presets like
persistent model selections, but selecting one dispatched the one-shot `/moa`
command (command.dispatch name=moa) — it ran a single turn through MoA and then
silently reverted to the prior model. The user saw MoA context for one message,
then it vanished with no indication.
Route MoA preset selection through the same persistent path real provider
selections use: onSelectModel({ model: preset, provider: 'moa' }) →
config.set model="<preset> --provider moa" → the gateway's switch_model. The
check mark now reflects the real current selection (currentProvider === 'moa'
&& currentModel === preset) instead of transient local state, and the
now-unused activeMoaPreset state is removed.
Tests: new model-menu-panel.test.tsx (2) — selecting a preset calls
onSelectModel with provider 'moa' (persistent), and the check renders on the
active preset. tsc -b clean.
With the default busy_input_mode=interrupt, a burst of rapid gateway
messages arriving while context compression is in flight could interrupt
the current turn and start a fresh turn against the pre-rotation parent
session. Because compression is interrupt-immune (#23975), the still-
running compression later rotates the id out from under that new turn,
and if the new turn also grew past the compression threshold it started
its own uncancellable compression on the same stale parent — forking
multiple orphaned one-shot sibling continuations (#56391).
While a state.db compression lock is held for the session, demote
'interrupt' busy-input mode to 'queue' semantics (mirroring the subagent
protection in #30170), so the follow-up message waits for the in-flight
compression + its id rotation to land instead of racing a new turn
against the stale parent. Ack copy explains the compression demotion.
Fixes#56391.
prompt.submit is fire-and-forget — turn completion is signaled by stream /
message.complete events, not the RPC return — but it inherited the generic 30s
default RPC timeout. A turn that legitimately takes >30s to ACK (MoA presets
running references + aggregator in series, deep reasoning, large tool chains)
popped a false 'request timed out: prompt.submit' toast at 30s while the turn
was still running and streamed its real answer in 60-120s later (#55024).
Add PROMPT_SUBMIT_REQUEST_TIMEOUT_MS (1_800_000 = the backend's
agent.gateway_timeout ceiling) and pass it on all four prompt.submit call sites
(submit, resume-recovery retry, regenerate, rewind), mirroring the existing
SESSION_LIST_REQUEST_TIMEOUT_MS opt-out precedent. Widen the GatewayRequest
type (+ the inline requestGateway prop type) to carry the optional timeoutMs the
runtime impl already accepts.
Tests: use-prompt-actions/index.test.tsx 34/34 pass; tsc -b clean.
Two independent MoA auxiliary-call fixes:
#53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout
were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long
reference/aggregator turn (mixed providers, deep reasoning, long tool chains)
has headroom instead of being cut mid-generation.
#53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by
the MoA acting-aggregator, compression, web_extract, session_search, etc.)
never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses
transport (agent/transports/codex.py) was warm. Derive the same
content-addressed key via the shared _content_cache_key(instructions, tools)
helper and set it on the aux Responses request, with the same host guards the
main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out
of cache-key routing).
Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical
prefix, differs on different instructions, skipped for xai/github hosts).
tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py
130 pass.
On the MoA path agent.model/provider are the virtual preset name (e.g.
"closed") and "moa", which have no pricing entry. estimate_usage_cost()
returned None for the aggregator turn, so the `if amount_usd is not None`
guard skipped it and the session's estimated_cost_usd reflected only the
advisor fan-out — a ~50% undercount when the aggregator does the full acting
loop (verified: $0.91 advisor-only vs $1.96 true, aggregator = 54%).
MoAChatCompletions.create() now stashes the resolved aggregator slot as
last_aggregator_slot (exposed via MoAClient); conversation_loop reads it to
price the aggregator turn at its real model/provider. cost_source flips from
'none' to 'provider_models_api'.
Follow-up to the END-MARKER reorder: moving the summary prefix after the
[PRIOR CONTEXT] wrapper meant _is_context_summary_content (prefix-at-start)
no longer recognized a merged-tail summary. That silently broke three
consumers — the last-real-user anchor (would pick the merged summary as a
real user turn, causing active-task loss), the carry-forward summary find,
and the auto-focus skip. _strip_summary_prefix would also carry the wrapper
+ stale tail content forward as the next summary body.
Extract the two delimiter strings into _MERGED_PRIOR_CONTEXT_HEADER /
_MERGED_SUMMARY_DELIMITER constants (writer + detector stay in sync), teach
_is_context_summary_content and _strip_summary_prefix to look past the
delimiter, and add a regression test. Standalone summaries unchanged.
When the compression summary is merged into the first tail message
(the alternation corner case where a standalone summary role would
collide with both head and tail), the old format was
SUMMARY + END_MARKER + OLD_TAIL_CONTENT — so the preserved tail content
appeared AFTER the end marker and the model could read it as a fresh
message to respond to.
Reorder so the END MARKER is always last: old tail content is wrapped in
[PRIOR CONTEXT ...][END OF PRIOR CONTEXT — COMPACTION SUMMARY BELOW]
delimiters, then the summary, then the END MARKER. _append_text_to_content
handles both string and multimodal-list content.
Salvaged from #56372 by @Gromykoss. Only the END-MARKER reorder half is
carried over. The PR's second change (a post-compaction pass that strips
user-role messages before the first summary marker on compression_count>=2)
was dropped: on 2nd+ compactions the protected head decays to system-only
(_effective_protect_first_n -> 0, #11996) so the targeted 'ghost head user'
does not occur, and where the strip does fire it deletes legitimate recent
tail user turns (data loss) and can leave consecutive assistant messages
(role-alternation violation).
The salvaged PR added the new key to locales/en.yaml only, so the i18n
catalog-parity test (tests/agent/test_i18n.py::test_catalog_keys_match_english)
failed for all 15 non-English locales. Add the key to every locale with the
English string (matching the existing convention for the untranslated
matrix_cross_room_success key), preserving the {name} placeholder so the
placeholder-parity test also passes.
The persisted (DB-fallback) branch of _resume_target_allowed() compared only
sessions.user_id against source.user_id, but build_session_key() keys the
participant on `user_id_alt or user_id` (Signal/Feishu carry the canonical
participant in user_id_alt). The sessions table has no user_id_alt column, so a
per-user row a caller shares the user_id of — but not the user_id_alt — maps to a
DIFFERENT live session key, yet the row's user_id matched both participants:
a co-member could resume/enumerate another member's persisted per-user group or
no-chat_id DM session (IDOR, CWE-639).
The live-origin guard (_same_origin_chat) already compares user_id_alt; the
persisted fallback couldn't. Fail closed on both identity-bearing per-user
branches (non-DM per-user group, no-chat_id DM) whenever the caller carries a
user_id_alt. Shared group/thread sessions (no participant scoping) and DMs keyed
on a present chat_id are unaffected; callers keyed on user_id (e.g. Telegram)
still resume their own rows; admin --all override still applies.
Regression: tests/gateway/test_resume_command.py::
test_resume_persisted_fallback_fails_closed_on_user_id_alt.
Addresses egilewski follow-up on PR #52355: the persisted-row fallback required
row_uid == caller_uid for every identity-bearing caller, which wrongly blocked a
legitimately SHARED non-DM group session. With group_sessions_per_user=False,
build_session_key resolves every participant of a chat to one session key, so a
co-member (different user_id) in the same chat shares Bob's session — but the
guard returned "/resume blocked".
Mirror is_shared_multi_user_session() in the fallback, exactly as the live-origin
branch (_same_origin_chat) already does: for a non-DM caller, first require the
same platform + chat + thread provenance (unchanged — blank/mismatching chat
still fails closed), then allow without user-id equality when the session is
shared, and keep requiring the same owner for per-user group/thread sessions.
DM scoping is unchanged (always per-user).
Adds a regression: shared group → co-member allowed; per-user group → blocked;
different chat → blocked even when shared.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses egilewski/CodeRabbit follow-up on PR #52355: the identity-bearing
persisted fallback compared row_chat == caller_chat, which SUCCEEDS when both
normalize to "" — so a legacy row with no stored chat provenance could still be
resumed by a caller that also has no chat_id (probe: a group caller with
chat_id=None resuming a NULL-chat telegram row on matching user_id).
A non-DM session (group/channel/forum/thread) is keyed by chat_id in
build_session_key, so a blank chat on either side is NOT proof of same-chat.
Require both row and caller chat_id to be non-blank and equal for non-DM
callers; a legacy NULL-chat row (or a caller missing its chat_id) now fails
closed. DMs are unchanged: they are keyed on user_id, so a no-chat_id DM row
stays resumable by the same user (and a mismatching chat_id, when present, is
still rejected).
Adds the blank-caller-chat group probe and a DM no-chat_id same-user/other-user
regression.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the egilewski/CodeRabbit and teknium1 reviews on PR #52355.
1) Persisted-row chat scope (egilewski/CodeRabbit). The sessions table stored
only source + user_id, so an identity-bearing caller could resume/list an
INACTIVE persisted row that matched source+user_id but belonged to a
DIFFERENT chat (probe: same user moves `same_user_chat_b` into chat-a).
Persist the messaging origin and compare it:
- schema: sessions gains origin_chat_id / origin_thread_id (declarative
auto-migration via the existing column reconciler).
- SessionDB._insert_session_row accepts + writes the two columns.
- the gateway records them at every origin-bearing creation: both
SessionStore create paths (get_or_create_session + reset/switch) and the
/title path that materializes a store-only session into the DB.
- _resume_target_allowed's identity branch now also requires
origin_chat_id AND origin_thread_id to match the caller. Legacy rows with
NULL origin (created before this change) cannot prove chat origin and
fail closed — resume them via a live session or an admin --all override.
The /sessions listing inherits the fix (non-Matrix rows route through the
same helper).
2) DM key-contract mirror (teknium1). _same_origin_chat's DM branch only
compared user_id and allowed when either side was missing, diverging from
build_session_key (no-chat_id DM keys are built from user_id_alt or
user_id). It now: treats an equal non-blank chat_id as sufficient (the DM
key IS the chat_id when present), and otherwise compares the effective
participant id (user_id_alt or user_id), failing closed on a
missing/different participant so two no-chat_id DM origins are never
conflated.
Tests: add same-user/different-chat (e2e + unit) and chat-scope unit cases;
add DM no-chat_id / user_id_alt / no-identity / same-chat_id cases; update
existing fixtures to record origin_chat_id like the gateway does; make the
cross-room `/resume --all` listing test run as admin (cross-room listing is
admin-gated) and give the boundary-state resume runner a live same-origin so
its post-resume clearing assertions exercise an authorized resume.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses egilewski (Codex/CodeRabbit) follow-up on PR #52355: the no-identity
branch of _resume_target_allowed() returned True after only checking that the
row's source didn't mismatch the caller platform. The sessions table has no
chat_id, so same-platform alone is not ownership proof — a Telegram group
caller in chat-a with user_id=None could resume (and /sessions could list) a
persisted row owned by another chat/user (e.g. victim_chat_b_uid,
source=telegram, user_id=victim).
Fail closed: an identity-less caller can no longer bind to or enumerate a
persisted session by id/title. A legitimate same-chat resume of an ACTIVE
session still works via the live-origin branch (which compares chat_id), and an
operator can use the admin --all override. The listing path inherits the fix
because _resume_row_visible() routes non-Matrix rows through the same helper.
Adds an end-to-end no-identity probe (resume blocked) and a unit-level
persisted-fallback assertion.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses egilewski (Codex) CR on PR #52355: the Matrix direct /resume <id>
guard (and the Matrix listing guard) used _same_matrix_room(), which compared
only platform + chat_id. But build_session_key() appends thread_id for every
chat type when present, and Matrix scopes the model's turn to the current
room/thread — so a live session in another thread of the SAME room is a
DIFFERENT session. A caller in thread A could resume a target whose live origin
was in thread B (switch_session fired on the victim session).
Add a thread_id equality check to _same_matrix_room so room scoping also
enforces the thread boundary. Non-threaded rooms have empty thread_id on both
sides ("" == ""), so existing room-level sharing is preserved unchanged; only
cross-thread access is newly blocked. This mirrors the thread handling already
in _same_origin_chat for the non-Matrix adapters.
Adds regressions replaying the reviewer's thread-a -> thread-b probe (direct
guard + listing path), plus same-thread-shared and thread-vs-no-thread cases.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses egilewski (Codex) CR on PR #52355: the persisted-row fallback in
_resume_target_allowed() skipped the platform/source check when sessions.source
was blank (the row_src guard only rejects a *mismatching* non-blank source),
then accepted the row on user_id equality alone. A legacy/malformed row with a
blank source but a matching user_id was therefore resumable — an identified
caller could bind to a transcript whose origin it can't prove.
Now an identity-bearing caller is allowed only when the row proves BOTH the
same owner (non-blank user_id match) AND the same platform/origin (non-blank
source match). A blank/legacy source fails closed, exactly like a missing
user_id. No-identity (single-user) callers are unaffected.
Adds a regression replaying the reviewer's blank-source same-uid probe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
/resume resolved a persisted session id/title with no ownership check on any
adapter except Matrix, so an authorized caller could bind their gateway session
to another user's/room's transcript and read it. The titled-session listing and
numeric index were also globally enumerable on non-Matrix platforms, exposing
the ids and previews needed to target the IDOR.
Generalize the Matrix-only room guard to an adapter-agnostic ownership check
(live origin when active; DB row source + user_id for persisted-only sessions,
the only fields available), applied to the direct-id/title path and the
listing/numeric paths on every platform. An explicit admin --all override is
honored. The Matrix path is preserved unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Windows _quote_cwd_for_cd override only reached _wrap_command; the
snapshot bootstrap cd in init_session still used a bare shlex.quote(),
so on Windows the bootstrap cd failed and pwd -P captured the login
shell's dir instead of terminal.cwd. Route it through _quote_cwd_for_cd
too, and add -- for hyphen-safety to match _wrap_command.
On Windows machines with both Linux and Git for Windows installed,
_find_bash() called shutil.which('bash') before checking known
Git-for-Windows install paths. shutil.which() may return a
non-MSYS bash which does not understand Windows-style paths.
This caused all terminal commands to fail with exit code 126
because the cwd prefix (a Windows path) was rejected.
Reorder the search: check Git for Windows install locations
(ProgramFiles/Git/bin/bash.exe etc.) before falling back to
PATH lookup. This matches the intent of the surrounding code
(portable Git preferred, system Git preferred, then PATH as
last resort).
Related: #23846 (same file, same class of Windows path issues)
The model could pass `toolsets` (top-level and per-task) to delegate_task,
letting it choose which toolsets a subagent got. Toolset selection is a
capability-scoping decision the model should not control; subagents inherit
the parent's enabled toolsets, period.
- Remove `toolsets` from the delegate_task() signature, the registry handler,
the top-level + per-task JSON schema, and the live dispatch path
(run_agent._dispatch_delegate_task — this forwarded it on every model call).
- Single-task and per-task child builds now pass toolsets=None so
_build_child_agent resolves to pure parent inheritance.
- Drop the now-dead _SUBAGENT_TOOLSETS / _TOOLSET_LIST_STR schema-hint block.
- _build_child_agent keeps its internal toolsets param + intersection helpers
(internal API; fed the inherited value only).
- Tests: schema assertions flipped to assertNotIn; added a regression test
proving the dispatch path never forwards a smuggled model `toolsets`.
- Docs: update delegate_task signature refs in the autonomous-ai-agents skill.
Consolidates the pairing/allowlist authorization model. Reverses the
read-side AND-ing from #56346 (which made a paired user require ALSO
being in the allowlist) and restores pairing as a first-class grant:
- authz_mixin: a pairing-store entry authorizes regardless of the
allowlist (union). approve_code is reachable only by the trusted
operator (CLI / authenticated dashboard), never by an inbound sender,
so it is not an attacker-controlled path — the #23778 bypass was the
inbound message/approval-button gate, fixed separately.
- pairing: when an allowlist IS already configured for the platform,
operator approval also appends the user to that allowlist env var
(option i) and revoke removes them, keeping a single operator-visible,
editable source of truth instead of an opaque approved.json. On an
open gateway (no allowlist) approval is a no-op on the env var so we
never silently lock an open gateway; the pairing store remains the
grant record, honored by the union.
- auto-resume authz (0de67ad60) now honors paired users automatically
via the same union — a legitimately-paired session survives restart.
Replaces the now-incorrect AND-ing tests with union + mirror + revoke
coverage. E2E verified: locked-gateway approve/revoke round-trips
through the allowlist; open-gateway approval stays open.
test_references_run_in_parallel asserted elapsed < 0.9 for two 0.5s
sleeps that run concurrently. On a loaded CI runner, thread-pool
startup pushed the wall time to 0.9001s — a 0.14ms miss — flaking the
shard. Loosen to < 0.95, which still sits well below the 1.0s serial
floor, so a genuine serialization regression (>=1.0s) still fails hard.
Delegate to hermes_bootstrap.harden_import_path() instead of the inline
'', '.' sys.path filter, matching entry.py/acp_adapter/entry.py after #51693.
The shared helper also relocates the Hermes source root ahead of an absolute
cwd path on sys.path (venv/PYTHONPATH case), which the inline filter missed.
Test static check rewritten to assert the shared guard runs before import cli.
The slash-command worker is spawned as `-m tui_gateway.slash_worker` and
inherits the user's CWD. A local package in that CWD (e.g. a project shipping
its own `utils/`, `proxy/`, or `ui/`) shadows the installed hermes module, so
`import cli` crashes the worker with:
ImportError: cannot import name 'atomic_replace' from 'utils'
The child then exits 1 in a crash loop. #15989 added this sys.path guard to the
sibling entrypoint tui_gateway/entry.py but not to this worker, which is spawned
as a separate process and so starts with CWD back on sys.path.
Apply the same guard (insert HERMES_PYTHON_SRC_ROOT, strip ''/'.') before the
first non-stdlib import. Add a regression test that imports the worker from a
CWD containing colliding packages.
Fixes#51286
The provider-parity contract (tests/hermes_cli/test_provider_parity.py)
requires every hermes model provider to be configurable in the desktop
Providers tabs. Vertex authenticates via OAuth2 (service-account JSON /
ADC) and has no api_key_env_vars, so — like bedrock's aws_sdk — it needs
its credential env var tagged to the provider card explicitly. Tag
VERTEX_CREDENTIALS_PATH to the vertex card in _catalog_provider_env_metadata().
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).
- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
(5-min margin), ADC->service-account fallback, global vs regional
endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
credentials-file path is never mistaken for a static API key; mints a
fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
re-mints the token and rebuilds the client on a mid-session 401, so a
long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
runtime resolution.
Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
The patch tool's strategy 7 (unicode_normalized) matches ASCII old_string
against a file containing real Unicode (em-dashes, smart quotes, ellipsis,
non-breaking spaces). Writing new_string verbatim silently replaced the
file's Unicode with the LLM's ASCII equivalents.
_preserve_unicode_in_replacement() diffs old_string->new_string and applies
only the actual edits to the file's original Unicode text, preserving
unchanged characters.
Salvaged from #50540 by @aj-nt. Only the Unicode-preservation half is
carried over; the write_file line-number-strip half was dropped (the
existing _looks_like_read_file_line_numbered_content reject guard already
covers its target case, and the strip's looser threshold risks silently
mutating legitimate pipe-delimited content).
- Correct the exit-75 comment: Hermes-generated units set
StartLimitIntervalSec=0 (rate limiting disabled), so StartLimitBurst
does not bound loops. The real bound is that genuine crashes exit
non-zero-but-not-75, and RestartForceExitStatus=75 only whitelists
the planned code.
- Add randomuser2026x AUTHOR_MAP entry (CI blocks unmapped emails).
The in-chat /restart command was leaving the gateway dead on systemd
deployments using Restart=on-failure (the default for many
operator-managed and tutorial-style unit files). The gateway drained,
exited cleanly (code 0), and was never revived — the only recovery was
a host reboot.
Root cause was a multi-layer assumption mismatch:
1. gateway/run.py:_stop_impl assumed all systemd units use
Restart=always, so the Linux/systemd branch returned exit code 0
and relied on a `systemd-run` transient helper to restart the unit
immediately. Units with Restart=on-failure never see a clean exit
as a trigger, so nothing revived the process.
2. gateway/run.py:_launch_systemd_restart_shortcut hardcoded
`--user` scope, so it could not even locate the unit PID on
system-level deployments (the common case for
/etc/systemd/system/hermes-gateway.service). It silently returned
without launching the helper.
3. Even after the scope detection was fixed, the helper could not
actually start: non-root gateway units (User=ubunutu) hit a Polkit
denial on `systemd-run --system` ("Interactive authentication
required"), and `--user` requires a D-Bus user session that is
typically absent on headless servers.
The fix is two-fold:
* `_stop_impl` now always exits with GATEWAY_SERVICE_RESTART_EXIT_CODE
(75 / EX_TEMPFAIL) on service-managed restarts, regardless of
platform. Combined with RestartForceExitStatus=75 in the unit file,
systemd treats the planned restart as a controlled failure and
revives the gateway via Restart=on-failure, with RestartSec as the
only delay. The planned-restart helper is still attempted (for
RestartSec=0 setups that want sub-second restarts) but is no longer
load-bearing.
* `_launch_systemd_restart_shortcut` now probes both system and user
scopes via MainPID equality and uses whichever scope actually owns
the gateway process. It bails out safely if neither matches.
StartLimitBurst in the unit file still bounds accidental restart
loops, and the macOS launchd path is unchanged.
Verified end-to-end on Ubuntu 24.04 with hermes-gateway as a
/etc/systemd/system/... service running under User=ubunutu. The
unit uses Restart=on-failure, RestartSec=30, RestartForceExitStatus=75,
StartLimitIntervalSec=600, StartLimitBurst=5. /restart from Feishu now
drains cleanly, exits 75, and the gateway is back online ~30s later
without manual intervention.
Tests: tests/gateway/test_gateway_shutdown.py renamed the affected
case to test_gateway_stop_systemd_service_restart_uses_tempfail and
now asserts exit_code == GATEWAY_SERVICE_RESTART_EXIT_CODE.
14/14 tests in this module pass.
browser_navigate's always-blocked cloud-metadata floor (169.254.169.254,
metadata.google.internal, ECS/Azure/GCP IMDS) was gated on
`not _is_local_backend()`, contradicting both the adjacent comment and the
is_always_blocked_url docstring ("denied regardless of backend"). A default
local headless Chromium on a cloud VM — or an off-host CDP browser — could
navigate to IMDS and read instance credentials into the model context. Make the
floor unconditional on the initial-nav and post-redirect paths.
Also: _is_local_backend() ignored a CDP override while _is_local_mode() honors
it, so an off-host CDP browser was treated as "local" and skipped the broader
private/internal SSRF check too. Treat a CDP override as non-local.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wrap each Telegram initialize() attempt in asyncio.wait_for(HERMES_TELEGRAM_INIT_TIMEOUT,
default 30s). When api.telegram.org and all fallback IPs are unreachable, the connect
chain has no outer bound, so a single initialize() blocks for minutes and the
retry-on-exception loop never fires — the gateway appears to hang after the banner.
The timeout guarantees each attempt is bounded, then retries with backoff, then fails
with an actionable error. Also adds WARNING-level progress logs before DoH discovery
and each connect attempt (visible at default log level).
Salvaged onto plugins/platforms/telegram/adapter.py (Telegram moved from
gateway/platforms/ since the PR was opened). Adds env var to docs + AUTHOR_MAP.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
resolve_nous_runtime_credentials / resolve_nous_access_token now read via
_load_provider_state_with_source (and write via _save_provider_state_to_source).
TestEnvOverrideWins mocked only the old _load_provider_state, so the real
(empty) state was read → AuthError. Mock the new boundary too, returning
(state, None) so the write-through helper treats it as the active store.
The salvaged _sync_session_model_from_agent reached into
self._session_db._execute_write with a duplicate inline read-modify-write
and a comment claiming SessionDB had no metadata updater — but
update_session_meta already exists for exactly this. It also called the
AsyncSessionDB forwarder synchronously (via _execute_write), which returns
an un-awaited coroutine, so the write silently never ran.
Route through the synchronous SessionDB (self._session_db._db) — the same
pattern the surrounding run_sync closure already uses (it runs off the
event loop in the executor) — and use the existing update_session_meta /
get_session helpers instead of raw SQL.
- Track auth store source path on Nous state reads and write rotated
OAuth refresh tokens back to the same store, preventing stale-token
replays when Hermes falls back to a global/root auth.json.
- Skip Nous fallback entries locally when no access/refresh token is
present, suppressing repeated failed resolution attempts within a
session.
- Sync session model metadata after fallback switches so the gateway
DB reflects the backend that actually served the latest turn.
Follow-up on the salvaged #49830 hardening. The contributor's sensitive
query-param set included bare English words (code, key, auth, session,
sig) that double as ordinary page facets — ?code= on promo/challenge
pages, ?key= as a search facet, ?session= on blogs — so web_extract and
cloud browser_navigate would refuse a large slice of normal browsing.
Narrow the set to unambiguously credential-named params (access_token,
authorization, client_secret, password, token, x-amz-signature, ...).
Prefix-based vendor-key redaction (is_safe_url) still catches recognizable
key shapes; this set is the belt-and-suspenders for opaque secrets carried
under an explicit credential-named parameter.
Also fixes two intra-PR-staleness test breakages surfaced by salvaging onto
current main:
- web_extract_tool() no longer accepts use_llm_processing= (signature
changed since the PR was authored) — dropped the invalid kwarg.
- agent.redact now fully masks keyed 'token=<secret>' to 'token=***'
instead of partial 'sk-...'; the console-redaction test now asserts the
real invariant (secret body gone) rather than the exact mask format.
Added a regression test that generic English-word query params are NOT
blocked by the credential guard.
Add policy gates and output redaction for browser/CDP surfaces, strengthen session ownership tracking, and block credential-like query parameters before third-party browser/web backends receive URLs.
Inspired by the agbrowse review: keep local browser magic-link flows possible while preventing cloud reader/browser escalation from receiving opaque token, code, signature, or key query parameters.
The persist user-message override was applied in place to the live messages
list. On the early crash-resilience persist (which runs BEFORE api_messages is
built), that stripped observed group-chat context off the live user message and
silently dropped it when observe_unmentioned_group_messages was enabled.
Fix at the single chokepoint: _flush_messages_to_session_db resolves the
override (idx/content/timestamp) locally and applies it ONLY to the row written
to the DB — the live dict is never mutated, so EVERY persist caller (early
persist, mid tool-loop flush, /resume, /branch) is protected uniformly. This
supersedes the earlier shallow-copy approach, which broke the intrinsic
_DB_PERSISTED_MARKER idempotency (copies never propagated the marker back to
the live dicts → duplicate rows) and closes the sibling class tracked in #56303.
Trailing empty-response scaffolding is still dropped from the live list in
_persist_session (unchanged behavior).
Salvaged from #48817; chokepoint reworked to coexist with the marker-based
dedup (#50372).
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The salvaged PR guarded only resolve_nous_access_token; the primary
resolve_nous_runtime_credentials path also POSTs the refresh token to
portal_base_url on refresh with no allowlist check. Mirror the guard
there so a poisoned host can't receive the bearer, and drop the stray
duplicated allowlist comment. Adds a sibling-site regression test.