rm -rf //, /., /./, /.. and //* all resolve to / in the shell but slipped
past the root-filesystem hardline pattern, whose target group only matched
the literal / and /* tokens. They fell to the softer DANGEROUS_PATTERNS
'delete in root path' rule, which --yolo / approvals.mode=off / cron
approve-mode are designed to bypass — leaving the one unconditional floor
open to a full root wipe under yolo.
Broaden the root token from '/|/\\*|/ \\*' to '/[/.]*\\**' inside
_hardline_rm_path so any root-anchored path whose components collapse back
to / (repeated slashes plus ./.. segments) with an optional trailing glob
is caught. A trailing real segment (/tmp, /home, /.ssh) still fails to
match and stays with the softer rules.
Co-authored-by: kernel-t1 <214165399+kernel-t1@users.noreply.github.com>
The xapp-<num>-<hash> format used by Slack App-Level / Socket Mode
tokens was missing from both agent/redact.py prefix patterns and
gateway/run.py gateway secret patterns, so SLACK_APP_TOKEN values could
leak through to chat users even with security.redact_secrets enabled.
Adds an anchored xapp-\d+- pattern to both redaction paths.
Add an interactive Raft setup flow for hermes gateway setup. The wizard follows the existing platform adapter setup pattern, persists RAFT_PROFILE to the Hermes env file, preserves an existing profile when the user declines reconfiguration, and registers the flow via setup_fn.
Add focused Raft adapter coverage for saving RAFT_PROFILE, keeping an existing profile, and registering setup_fn.
Signed-off-by: skyzh <skyzh@mail.build>
Signed-off-by: HaoHao <HaoHao@mail.build>
The credential-pool Codex refresh path synced tokens from auth.json and
then POSTed the refresh_token to OpenAI's token endpoint without holding
the cross-process auth-store lock across the whole read->POST->write-back
sequence. Because Codex refresh tokens are single-use, two concurrent
Hermes processes could both adopt the same on-disk token and both POST
it; the loser got refresh_token_reused / invalid_grant.
Wrap the Codex OAuth branch of _refresh_entry in the existing shared
_auth_store_lock (reentrant, cross-process flock) using the same
extended-timeout pattern resolve_codex_runtime_credentials() already
uses. A waiting process now blocks on the lock and, once inside, the
in-lock re-sync picks up the rotated token the winner persisted and
skips its own POST. Also send User-Agent: hermes-cli/<version> on the
refresh request.
Credit @cooper-oai (#34820) for identifying the concurrent-refresh
reuse race; this ships the narrow lock-serialization fix without the
separate Codex auth-store partition.
Matrix outbound image downloads validated only the final URL after
following redirects, so a public URL that 302-redirects to loopback /
private-network / cloud-metadata endpoints had already connected to the
unsafe hop before the check ran.
Re-validate every redirect hop before following it:
- aiohttp path resolves redirects manually with allow_redirects=False,
validating each Location via is_safe_url (aiohttp can't use the httpx
response event hook).
- httpx fallback installs the shared _ssrf_redirect_guard event hook.
Regression tests cover per-hop blocking of an unsafe redirect, following
a safe redirect chain, and httpx guard wiring.
auxv leaks AT_RANDOM (stack canary seed) + AT_BASE/AT_PHDR load
addresses — an ASLR oracle on par with maps. pagemap exposes
virtual->physical translation. Both slipped through the endswith
tuple alongside the maps family covered by the salvaged commit.
Adds regression coverage for auxv/pagemap and for the per-thread
/proc/<pid>/task/<tid>/<file> alias form (endswith catches both).
Follow-up on #32238, closes#34430.
PR #4609 blocked /proc/*/maps to prevent ASLR layout leakage, but the
endswith("/maps") check does not match /proc/*/smaps or
/proc/*/smaps_rollup — both expose the same virtual-address layout and
bypass the guard. /proc/*/numa_maps carries the same data with NUMA
annotations and is equally bypassed. /proc/*/mem (raw process memory)
is added as defence-in-depth; it requires address knowledge to exploit
but is blocked for consistency.
Extends the endswith tuple in _is_blocked_device_path() to cover all
four variants and adds regression assertions for all new paths to
test_proc_sensitive_pseudo_files_blocked.
Partially addresses #4427.
An invite to a room with no remaining members surfaces as "no servers
in the room have been provided" or "room not found" on join. The pending
invite was never cleared, so every gateway startup re-attempted the join
and re-emitted the warning indefinitely.
Detect that specific failure mode by narrow error-message match and call
leave_room to decline the invite; transient/network errors leave the
invite untouched for the next sync. Adds 5 tests.
Reimplements the matrix portion of #33953 onto the current plugin adapter
(gateway/platforms/matrix.py was relocated to
plugins/platforms/matrix/adapter.py since the PR was opened). The two
gateway/status.py fixes from that PR (wrapper-subcommand rejection,
psutil start-time fallback) already landed on main independently.
Reported by @Bougey; original patch authored by @KiraKatana.
_maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a
high-risk tool (web_extract, browser_*) returns a multimodal content
list ([{type:text},{type:image_url}]) — which _tool_result_content_for
_active_model() produces by unwrapping the _multimodal envelope for
vision-capable providers — the text part reached the model completely
unguarded. An attacker page that ships one image bypassed the entire
untrusted-data wrapper.
Extend the wrapper to handle list content: each {type:text} part is run
through the same string-wrapping path (min-char threshold, delimiter
neutralization, one well-formed block), image/video parts pass through
untouched so the list stays valid for vision adapters. Recursing into
the existing string branch means the list path inherits the delimiter
defang and the no-forgeable-fast-path hardening from #56172 for free.
The outer list is rebuilt (not returned by identity), so callers compare
by value.
## What does this PR do?
Closes a critical bypass of the dangerous-command approval system. The
normalizer that every command passes through before pattern matching
(`_normalize_command_for_detection`) already strips ANSI, null bytes,
fullwidth Unicode, backslash escapes and empty-quote token splits — but
it did nothing about the shell `IFS` variable. In any POSIX shell `$IFS`
and `${IFS}` expand to whitespace, so a command written as
`rm${IFS}-rf${IFS}/` is executed by the live shell as `rm -rf /` while
the detection regexes — which anchor on literal `\s` between a command and
its arguments — never fire.
The impact is severe: this evades BOTH layers at once. It slips past every
entry in `DANGEROUS_PATTERNS` (so `curl${IFS}...|sh`, `sed${IFS}-i`
against `~/.hermes/config.yaml`, sudo privilege flags, etc. auto-run with
no approval prompt) AND the unconditional hardline floor that is
documented as un-bypassable "not even with --yolo" (`rm -rf /`, `mkfs`,
`dd` to a raw block device, `shutdown`/`reboot`, fork bomb). A
prompt-injected or malicious instruction could wipe the host filesystem or
power the box off while the approval system reports nothing. Confirmed at
runtime before the fix: `detect_hardline_command('rm${IFS}-rf /')` returned
`(False, None)`.
The fix mirrors the shell's own expansion: it collapses `$IFS` / `${IFS}`
(including the bash substring form `${IFS:0:1}`) to a single space inside
the existing de-obfuscation block, so the whitespace-anchored patterns
match exactly as they do for the un-obfuscated command. It is deliberately
narrow and safe — a `\b` word boundary keeps it from touching unrelated
variables like `$IFSACONFIG`, so it cannot introduce false positives on
legitimate commands.
## Related Issue
N/A
## Type of Change
- [x] 🔒 Security fix
## Changes Made
- `tools/approval.py`: in `_normalize_command_for_detection`, substitute
`$IFS` / `${IFS}` (and `${IFS:...}`) expansions with a literal space
before dangerous/hardline pattern matching, alongside the existing
backslash and empty-quote de-obfuscation.
- `tests/tools/test_approval.py`: add `TestIFSWhitespaceBypass` covering
the brace, bare and substring IFS forms against both
`detect_hardline_command` and `detect_dangerous_command`, plus
regression guards that a look-alike variable (`$IFSACONFIG`) and plain
safe commands are not flagged. Import `detect_hardline_command`.
## How to Test
1. Reproduce the hole (pre-fix): `detect_hardline_command('rm${IFS}-rf /')`
returns `(False, None)` and `detect_dangerous_command(...)` returns
`(False, ...)`, i.e. a host-destroying command is auto-approved.
2. With the fix applied, both now flag the command: hardline match
"recursive delete of root filesystem" and dangerous match "delete in
root path".
3. Run the suite: `pytest tests/tools/test_approval.py
tests/tools/test_hardline_blocklist.py -q` — the new
`TestIFSWhitespaceBypass` cases pass and nothing else regresses.
## Checklist
### Code
- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (no unrelated commits)
- [x] I've run the relevant tests and they pass (two pre-existing failures
are environmental: missing optional deps in the minimal venv, not
caused by this change)
- [x] I've added tests for my changes
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — the change is a
pure string transform with no platform-specific behavior; footgun gate passes
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
`@file` / `@folder` context-reference expansion enforced its own narrow
deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`)
that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`,
and `skills/.hub`. It never blocked the credential stores that the canonical
read guard (`agent/file_safety.get_read_block_error`) protects: provider API
keys (`~/.hermes/auth.json`), Anthropic OAuth tokens
(`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`),
webhook HMAC secrets, and project-local `.env` files.
This matters because the messaging gateway feeds **untrusted** remote text
straight into reference expansion: `gateway/run.py` calls
`preprocess_context_references_async(..., allowed_root=_msg_cwd)` where
`_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat
peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass
the `allowed_root` check (it resolves under HOME), slip past the narrow list,
and have the operator's live keys read into the agent's context — where the
model would typically echo or act on them.
Rather than duplicate and re-sync a second secret list, this routes the guard
through the existing single source of truth. A reviewer might ask "why not just
add `auth.json` to the local list?" — because the local list has already drifted
once (a prior commit had to add `.config/gh`); anchoring to
`get_read_block_error` means every future addition there protects this path too.
The narrow checks are kept as a fallback since they also cover dirs that guard
does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped
so it can never crash reference expansion.
N/A
- [x] 🔒 Security fix
- `agent/context_references.py`: `_ensure_reference_path_allowed` now also
consults `agent.file_safety.get_read_block_error` after its existing checks
and refuses the reference when that canonical guard flags the resolved path.
The lookup is wrapped so guard-resolution failures fall back to the explicit
checks instead of breaking expansion.
- `tests/agent/test_context_references.py`: added
`test_blocks_canonical_read_denylist_credential_stores`, asserting that
`@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/*`, and
a project-local `.env` are all refused and their secret bodies never reach the
expanded message.
- `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release
gate).
1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests
pass, including the new credential-store case.
2. Regression proof: stash `agent/context_references.py`, run the suite with
`-- -k canonical`, and confirm the new test fails (secrets leak into the
message) without the fix; restore and confirm it passes.
3. `ruff check agent/context_references.py tests/agent/test_context_references.py`
and `python scripts/check-windows-footguns.py agent/context_references.py
tests/agent/test_context_references.py` both pass.
- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (plus the AUTHOR_MAP release gate)
- [x] I've run the test suite for the touched area and all tests pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Follow-up to the #55780 dead-target not_found blast-radius fix (merged in
#56225). classify_send_error and is_chat_level_not_found each built their own
lowercased error blob, but divergently: classify_send_error appended the
exception CLASS NAME while is_chat_level_not_found did not. A caller passing
exc= to both could get inconsistent answers on the same failure.
- Extract _error_blob(exc, error_text) as the single source of truth both
classifiers use (str(exc) when non-empty + class name; no stray leading
space).
- Align is_chat_level_not_found's signature to (exc, error_text), matching
classify_send_error, removing the swapped-positional footgun; update the
sole caller and the three tests to keyword form.
- Add a regression guard asserting _error_blob keeps the class name.
Surfaced by the hermes-pr-review Phase 2c structured review of #56225.
The messaging gateway table still listed /new ("Start a new
conversation") and /reset ("Reset conversation history") as two
separate commands with divergent descriptions. /reset is an alias
of /new (see COMMAND_REGISTRY in hermes_cli/commands.py) — same
handler, fresh session ID + history. Collapse them into one row
matching the registry wording and the CLI table already on line 39.
Closes#42829.
#55115 added the dead-target registry so confirmed-dead delivery targets are
short-circuited. Its documented scope (gateway/dead_targets.py) is deliberately
narrow: only *whole-chat* deaths -- the `forbidden` and chat-level `not_found`
(`chat not found`) kinds -- should be recorded; "Thread/topic-level not_found is
NOT recorded here ... a deleted topic does not mean the parent chat is dead."
But the implementation doesn't honor that scope. classify_send_error collapses
chat-level "chat not found" AND thread/message-level not_found ("thread not
found", "topic_deleted", "message_id_invalid", "message to edit/reply not
found") into one "not_found" kind, _DEAD_ERROR_KINDS contains "not_found"
wholesale, and deliver()'s except marks the PARENT chat_id dead. So a single
deleted Telegram topic or edited-away message permanently marks the entire chat
(and every future scheduled / cron / agent delivery to it) dead -- silently. The
adapter self-heal the docstring relies on only covers the non-private-group
thread retry; named-DM-topic and message-level failures propagate to deliver()'s
except and wrongly kill the whole chat.
Add is_chat_level_not_found() (factoring the not_found substrings into chat-level
vs sub-chat-level constants) and gate the delivery dead-path: a "not_found" only
marks the target dead when it is chat-level. classify_send_error's public
contract is unchanged (still returns "not_found" for every shape); only the
mark_dead decision is refined, restoring the registry's documented scope.
Cross-platform: telegram/slack/discord delivery all flow through
classify_send_error -> mark_dead. Adds regression tests through the real
deliver() path plus helper/classifier units.
The three salvaged PRs (#46647, #54583, #55013) were authored against a
tree where _refresh_agent_cache_message_count was sync and _session_db was
the raw SessionDB. On current main the helper is async and awaits the
AsyncSessionDB facade, and _run_agent was split into _run_agent_inner.
- Wrap test _session_db in AsyncSessionDB so the awaited get_session works
- Make refresh-calling tests async + await the helper
- Point the placement-guard test at _run_agent_inner (recursion lives there
post-mixin-extraction)
- Relocated production call sites now correctly await the async helper
The cross-process coherence guard (#45966) compares the session's
on-disk message_count against the snapshot stored next to the cached
agent, and rebuilds the agent on a mismatch. The guard is correct
when the cache snapshot and the live count both refer to the same
DB row. But the agent cache is keyed by session_key, which can
group multiple conversation threads (different session_ids) under
the same key — and the message_count values belong to DIFFERENT
DB rows.
When the user switches from session A to session B under the same
session_key, the cache hit returns A's cached agent. The guard then
compares A's snapshot count (A.message_count) against B's live count
(B.message_count) — they are NEVER equal because they track
different conversations — and invalidates the cache. Every session
switch busts the prompt cache and forces a fresh agent build. The
post-turn re-baseline (#46237) made it worse: it reads the live
count from the CURRENT session_entry.session_id, so each switch
overwrites the original snapshot with the new session's count,
causing the very next switch BACK to the original session to fire
the guard again.
This is the bug from #54947 (P0, sweeper:risk-session-state,
sweeper:risk-caching).
Fix:
* Record the snapshot's session_id alongside the message_count in
the cache tuple: (agent, sig, mc, session_id) — a 4-tuple. The
cache build at the AIAgent construction site stores the active
session_id.
* The cache-hit guard skips the cross-process count comparison
when the active session_id differs from the snapshot's
session_id — the comparison is meaningless across different DB
rows, so the agent is REUSED without invalidation. The cross-
process guard still fires when the session_id matches and the
live count differs (genuine cross-process write on the SAME
session).
* _refresh_agent_cache_message_count checks the snapshot's
session_id: when it differs from the current session_id, the
snapshot is intentionally left untouched (overwriting it would
corrupt the original conversation's baseline and cause the
switch-back to fire the guard). The legacy 3-tuple shape (no
session_id) is still re-baselined as before.
* Backward-compat:
- 2-tuple (agent, sig) — unchanged, opts out of the guard.
- 3-tuple (agent, sig, mc) — unchanged behavior, standard
cross-process check.
- pending sentinel — unchanged, untouched by re-baseline.
- new 4-tuple (agent, sig, mc, session_id) — full session_id-
aware guard with skip on mismatch.
Tests:
* tests/gateway/test_session_id_cache_coherence.py — 7 tests
covering L1-L5 from LAYERS.md:
- L1 session_id switch must REUSE
- L2 cache tuple records snapshot's session_id
- L3 re-baseline skips when session_id differs
- L4 same-session_id turns still re-baseline (#46237 holds)
- L5 legacy 2-tuples and pending sentinels untouched
- legacy 3-tuple (no session_id) still guarded (#45966 holds)
- 3-tuple transitions to 3-tuple (not 4-tuple) on re-baseline
No regressions in 70 existing tests in test_agent_cache.py or 137
related session tests. Co-authored with #52197 (deferred cleanup
of evicted agents); both fixes compose cleanly.
The cross-process cache-coherence guard (#45966) compares a session's
on-disk message_count against a snapshot stored next to the cached agent,
rebuilding the agent on a mismatch so a foreign writer (e.g. the dashboard
backend) can't leave the in-memory transcript stale.
On a fresh gateway conversation the post-turn re-baseline
(_refresh_agent_cache_message_count) ran BEFORE the first-turn `session_meta`
marker row was appended to the transcript. That append goes through
append_to_transcript -> append_message, which increments message_count
unconditionally. So the snapshot was left exactly one short of the live
count, and on turn 2 of every fresh conversation the guard mistook this
process's own session_meta write for a foreign write, evicting and rebuilding
the cached agent — silently busting the per-conversation prompt cache the
cache exists to protect.
Move the re-baseline to after the turn's full transcript persistence block
(including the session_meta append and the compression session_id swap). The
snapshot now matches the live count, so the guard fires only on genuinely
foreign writes. This also makes the call honor its own documented contract of
using the compaction-updated session_id.
Adds a regression test that drives the real _handle_message_with_agent
against a real SessionDB and asserts the invariant: after a fresh first turn,
snapshot == live message_count, so the next turn's guard reuses the cached
agent. Fails before this change, passes after.
The cross-process cache-coherence guard (#45966) re-baselines the cached
agent's message_count only on the external-turn boundary (#46237, at
_handle_message_with_agent). The in-band queued (/queue) follow-up recurses
into _run_agent mid-chain with the stale build-time snapshot, so the
follow-up's guard sees the first turn's own writes as a mismatch and rebuilds
the agent -- re-introducing the every-turn rebuild / prompt-cache destruction
#46237 set out to prevent, on the in-band path. Re-baseline before the
recursion, symmetric with the accepted external-path fix.
Review follow-up on the concurrent-tool deadline salvage. timed_out_indices is
snapshotted from not_done at the deadline; a worker can still finish and write
results[i] in the window before the post-execution result loop reads it. The
loop unconditionally replaced results[i] with a fabricated 'timed out' message
for any snapshotted index, discarding a genuinely-successful (just-late) result.
Gate the timeout message on 'and r is None' so a real result always wins. Add a
regression test that forces the snapshot-vs-result-loop race deterministically
(mutation-checked: reverting the guard fails it). Also document the intentional
detached-worker leak at the executor abandon site.
A tool with no internal interrupt check (read_file, web_search, or a wedged
terminal backend) that never returns keeps the concurrent-tool poll loop alive
forever: the loop only breaks when all futures finish or an interrupt is
requested, and the 30s heartbeat resets the gateway idle monitor so idle-kill
never fires. The ThreadPoolExecutor was also used as a context manager, so its
__exit__ joined the hung worker with wait=True.
Add a wall-clock batch deadline (HERMES_CONCURRENT_TOOL_TIMEOUT_S, default 420s
— above the 360s web_extract timeout; 0/negative disables). When it fires:
cancel pending futures, signal an interrupt to the worker threads, abandon the
executor (shutdown wait=False, cancel_futures=True) so hung threads aren't
joined, and return a per-tool 'timed out' result for the unfinished calls while
still surfacing the finished ones. Also fixes the latent futures.index(f)
lookup (ambiguous with duplicate futures) by tracking a future->index map.
Salvaged from #54562.
Co-authored-by: Gustavo Mendes <87918773+gustavosmendes@users.noreply.github.com>
verify_on_stop / pre_verify append a synthetic assistant "done" plus a
synthetic user nudge to keep the agent going one more turn before it can
claim completion. Both were flagged (_verification_stop_synthetic on the
nudge only), but the flags were never registered in
_EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding()
filter that guards both persistence sinks (SQLite flush + JSON snapshot)
let them through. The resumed transcript then inherited loop-only
scaffolding, invalidating the prompt-prefix cache on later turns.
- add _verification_stop_synthetic and _pre_verify_synthetic to
_EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use)
- flag the blocked attempt assistant message too, not just the nudge, so
the whole synthetic pair drops together and persistence does not keep a
premature done with the nudge stripped (assistant to assistant adjacency)
The API-payload leak claimed in the report is already handled: the
chat_completions transport strips every underscore-prefixed message key
before the wire, so the marker never reaches strict providers.
Reported by patppham.
Widen the salvaged #32243 fix to the try_activate_fallback path: a custom
provider pointed at the native api.anthropic.com host (no /anthropic path
suffix, name != anthropic) fell through to chat_completions -> POST
/v1/chat/completions -> 404. Match the host the same way determine_api_mode()
and _detect_api_mode_for_url() now do. Absorbs #49247.
End-to-end regression coverage for #32243 that asserts every runtime
branch resolving an Anthropic endpoint returns
`api_mode == "anthropic_messages"`:
* `_resolve_explicit_runtime` — the path used when a Hermes
subcommand passes an explicit `--api-key` / `--base-url`. Pins
that a stale persisted `model.api_mode: chat_completions` from a
prior provider migration cannot override the anthropic pin.
* `_resolve_runtime_from_pool_entry` — the path triggered by
`hermes auth add anthropic --type oauth` (the exact flow from the
issue). Same stale-api_mode regression pinned here.
* `_try_resolve_from_custom_pool` — the user-defined
`providers:` / `custom_providers:` path that depends on the
URL detector fix landed in the prior commit. Asserts both the
detector fallback fires for `api.anthropic.com` and that an
explicit `api_mode_override` still wins (so users who DELIBERATELY
pointed a chat_completions transport at api.anthropic.com for
OpenAI-compat experiments aren't hijacked).
Co-locates the three contracts so a future refactor of one branch
cannot silently diverge from the others and re-introduce the
"out of extra usage" 400 on fresh OAuth Pro/Max credentials.
Add a dedicated `TestDirectAnthropicHost` class to
`test_detect_api_mode_for_url.py` covering the native Anthropic host
shape (bare, trailing slash, /v1 suffix, uppercase host) plus the
two negative-space regressions that matter for security: lookalike
subdomains (`api.anthropic.com.attacker.test`) and path-segment
spoofing (`https://proxy.example.test/api.anthropic.com/v1`) must
NOT be classified as native — leaking an Anthropic OAuth token to
either would be the worst case.
Refs #32243.
`_detect_api_mode_for_url` previously returned `None` for the bare
`api.anthropic.com` host, causing every URL-fallback path
(custom_providers, direct-alias, the api-key fallback inside
`resolve_runtime_provider`) to default to `chat_completions` for
native Anthropic — which routes requests to the OpenAI-compat
`/chat/completions` shim instead of the native `/v1/messages`
endpoint.
Pro/Max OAuth subscriptions are only billed against the native
Messages API; the shim bills against a separate "extra usage" pool
that is empty by default, so a freshly authorized Pro/Max credential
400s with "You're out of extra usage" the moment it's used — even
on an account that has consumed nothing for the current cycle.
Brings the helper in line with `hermes_cli.providers.determine_api_mode`
which already mapped `api.anthropic.com` to `anthropic_messages`.
models_dev.py's fetch uses a synchronous requests.get(timeout=15). Called
from the async gateway message handlers, it blocked the event loop for up
to 15s, starving Discord heartbeats and causing ClientConnectionResetError
disconnects.
Adds get_model_context_length_async() which offloads the entire sync
resolution chain to a worker thread via asyncio.to_thread(), and switches
the two async gateway call sites (_prepare_inbound_message_text,
_handle_message_with_agent) to await it. The loop stays responsive; the
sync path remains the single source of truth for the cache.
Salvaged from PR #22753 by @itenev. Follow-up: dropped the unused
fetch_models_dev_async/lookup_models_dev_context_async aiohttp variants
from the original PR (dead code with zero callers that had drifted from
the sync cache logic) — the to_thread wrapper already runs the sync path
off-loop, so they were redundant.
_strategy_exact advanced its scan cursor by pos+1 instead of
pos+len(pattern), so self-overlapping patterns (e.g. "aa" in "aaaa")
matched at overlapping offsets. _apply_replacements works in reverse
order, so the second replacement operated on already-modified content
using stale offsets — corrupting the file and reporting the wrong count
under replace_all=True. Advancing by len(pattern) matches str.replace()
semantics.
When two features register a post-delivery callback for the same session
(e.g. background-review release + /goal continuation), the second
registration is composed with the first via a `_chained` wrapper. That
wrapper was `def _chained()` — a sync function calling each callback
via `_prev()` / `_new()` and discarding the return value.
For sync callbacks that's fine. For async callbacks (such as the
`_deliver()` coroutine the /goal feature registers to inject the
continuation prompt) the returned coroutine was silently dropped:
RuntimeWarning: coroutine '_deliver' was never awaited.
Outer invoker in `_handle_message` already checks
`inspect.isawaitable(_post_result)` and awaits — but only sees the
wrapper's return value, which was `None`.
Fix: make `_chained` async, iterate over chained callbacks, await any
that return an awaitable. Outer invoker already handles awaitable
wrappers, so no other change is needed.
Tested:
* Added two regression tests in test_post_delivery_callback_chaining.py
covering an async callback chained behind sync (and vice versa).
* Updated existing chaining tests + test_run_cleanup_progress.py to
await the popped callback when it's awaitable.
* 62 tests pass across the touched suites.
Live-validated on Discord: /goal continuations now arrive after the
first turn's response is delivered (previously silent).
Refs: NousResearch/hermes-agent#31922
Vitest regression that builds `dist/entry.js` and checks two
structural invariants required for startup to not hang:
1. Zero `async "<path>"() { … }` keys inside any `__esm` definition.
esbuild only emits the `async` form when a module body contains
top-level await; the `__esm` helper at the top of the bundle
does not await nested inits, so any async wrapper participating
in a circular module graph would deadlock the boot
`await Promise.all([…])` in `src/entry.tsx`.
2. No `node_modules/ink/build/index.js` or
`node_modules/ink-text-input/build/index.js` modules. Their
absence is what makes invariant 1 hold today; if a future commit
re-introduces the `ink-text-input` re-export, this test catches
it before the bundle ships.
The test rebuilds the bundle on demand when the source is newer than
`dist/entry.js`, runs in <100ms with no TTY needed, and is hermetic
on a clean checkout.
Update the comment on the `alias` entry to mention the second reason
the source-inline is needed: keeping the upstream `ink` /
`ink-text-input` graph out of the bundle (which fixed the startup
deadlock in #31227). Code path is unchanged.
The dashboard TUI bundle hung at startup with only 141 bytes of ANSI
reset sequences and a blank screen forever. Root cause: esbuild's
lightweight `__esm` helper at the top of `dist/entry.js` does not
await nested async init, so a circular async cycle in the module
graph never resolves. The cycle came from re-exporting
``TextInput`/`UncontrolledTextInput`` from `'ink-text-input'` here —
that npm package depends on the upstream `ink` package, whose graph
loops back through React + our in-tree `@hermes/ink` ink fork. The
result: `init_entry_exports` was emitted as `async … await
init_build4()` (where `build4` is `node_modules/ink-text-input/build`),
and the top-level `await Promise.all([init_entry_exports().then(...)])`
in `src/entry.tsx` deadlocked waiting on the dangling Promise.
Nobody in `ui-tui/` actually imports `TextInput` from `@hermes/ink` —
the composer uses the in-tree `src/components/textInput.tsx` widget
instead. Drop the re-export from the source so the bundle no longer
inlines the upstream ink graph at all. Callers that legitimately want
the upstream widget can still import it from the dedicated
`@hermes/ink/text-input` subpath, which sits outside `entry-exports`
and so does not get inlined into consumers' bundles.
After the fix:
* `dist/entry.js` shrinks from 2.9MB → 2.4MB (~11.5k fewer bundled
lines) with zero `async __esm` wrappers remaining.
* `init_entry_exports` is now a synchronous `__esm` module.
* The bundle's top-level await chain resolves in ~30ms instead of
hanging.
Subprocesses spawned by the terminal tool, execute_code, Docker backend, and
the codex app-server could inherit Hermes-internal secrets that the name-based
`_HERMES_PROVIDER_ENV_BLOCKLIST` can't enumerate, because they're injected into
`os.environ` at runtime under dynamic names:
- `AUXILIARY_<TASK>_API_KEY` / `AUXILIARY_<TASK>_BASE_URL` — per-task side-LLM
credentials bridged from `config.yaml[auxiliary]` by gateway/run.py and cli.py
(vision, web_extract, approval, compression, plugin-registered tasks). Often
separate, higher-spend keys plus base URLs pointing at private endpoints.
- `GATEWAY_RELAY_*_SECRET` / `_KEY` / `_TOKEN` — relay-auth material provisioned
by gateway/relay.
Additionally, agent/transports/codex_app_server.py built its spawn env from a
raw `os.environ.copy()`, bypassing the centralized `hermes_subprocess_env()`
helper entirely — handing every codex subprocess the full Tier-1 secret set
(GH_TOKEN, gateway bot tokens, Modal/Daytona infra tokens, dashboard session
token) unfiltered. This is the #29157 sibling spawn-site gap; copilot_acp_client
already routes through the helper.
Fix — single chokepoint:
- Add `_is_hermes_internal_secret(key)` in tools/environments/local.py as the
single source of truth for the dynamic secret patterns. Matches
AUXILIARY_*_API_KEY / _BASE_URL and GATEWAY_RELAY_*_SECRET/_KEY/_TOKEN; leaves
non-secret AUXILIARY_*_PROVIDER/_MODEL and GATEWAY_RELAY routing hints visible.
- Wire the predicate into every spawn path unconditionally (ignores skill
env_passthrough opt-in AND inherit_credentials — a model-driving CLI never
needs these): `_sanitize_subprocess_env` (both loops), `_make_run_env`
(foreground), `hermes_subprocess_env` (Tier-1), and the Docker forward filter.
- Add the static GATEWAY_RELAY_* names to `_HERMES_PROVIDER_ENV_BLOCKLIST` so the
exact-match path catches them independently of the predicate.
- Add the GATEWAY_RELAY_ID/_SECRET/_DELIVERY_KEY triplet to `_ALWAYS_STRIP_KEYS`
(Tier-1) so it is stripped unconditionally on EVERY spawn surface — including
the codex/copilot `inherit_credentials=True` path that skips the Tier-2
blocklist. `_SECRET`/`_DELIVERY_KEY` are already predicate-matched; `_ID` has
no secret suffix, so enumerating it here is what closes its leak on the
inherit path (self-review W1).
- Defense in depth: env_passthrough.py `_is_hermes_provider_credential()` now
consults the same predicate, so a skill can't register these names as
passthrough and tunnel them into an execute_code / terminal child.
- Route codex_app_server through `hermes_subprocess_env(inherit_credentials=True)`
— strips Tier-1 + dynamic-internal secrets while provider creds (which codex
needs to authenticate) still flow.
Consolidates PRs #53715 (necoweb3 — the _is_hermes_internal_secret backbone +
Docker filter), #53503 (srojk34 — env_passthrough guard), and #55709 (srojk34 —
codex routing). Retires #52348 (claudlos): its copilot half is already on main,
and its codex half used the full-strip `_sanitize_subprocess_env` which would
break codex provider auth — the correct tier is `inherit_credentials=True`.
Tests: TestHermesInternalDynamicSecrets (terminal + predicate + passthrough
override), TestInternalDynamicSecrets (hermes_subprocess_env both tiers),
TestSpawnEnvSecretStripping (codex spawn env), plus env_passthrough
defense-in-depth cases.
Co-authored-by: necoweb3 <sswdarius@gmail.com>
Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>
Co-authored-by: claudlos <claudlos@agentmail.to>
AIAgent.run_conversation() promises a dict with final_response, but 16
terminal-failure branches returned dicts that either omitted the key or
set it to None. Callers that index result['final_response'] directly
(run_agent.py chat() + the __main__ printer) turn a real provider/context
failure into an opaque KeyError instead of surfacing the actionable error.
Every offending branch already carried usable 'error' text, so this
mirrors that text into final_response for all 16 sites (8 that omitted the
key, 8 that returned None). Adds an AST regression test that fails if any
run_conversation() dict return omits final_response or sets it to a literal
None, and tightens the invalid-response test to assert final_response == error.
The network-error reconnect ladder (#55992) captured a stable self._app
local across its awaits and failed fast when the adapter was torn down
mid-sleep. The 409-conflict retry path had the identical unguarded
self._app.updater.start_polling() deref — a concurrent disconnect()
during its RETRY_DELAY sleep would raise the same 'NoneType' object has
no attribute 'updater' and, on a non-final retry, land in limbo. Apply
the same stable-local + fail-fast pattern so the existing except block
reschedules or escalates to fatal.
A delayed fatal-error notification from an adapter instance that has
already been replaced by a successful reconnect (a different adapter
object now owns the platform slot) was still processed: it overwrote
the platform's runtime status back to retrying/fatal and could
re-queue an already-healthy platform for reconnection.
Snapshot the current owner of the platform slot at the top of
_handle_adapter_fatal_error and bail out before any side effect when
it belongs to a different, already-installed adapter.
_handle_polling_network_error's chained retry never updated
self._polling_error_task, so the reentrancy guard shared with the
heartbeat loop and the pending-updates probe went stale mid-recovery,
letting more than one recovery attempt run concurrently against the
same adapter. Combined with a TOCTOU window in
_handle_adapter_fatal_error (the adapter was only removed from
self.adapters in a finally block after awaiting disconnect()), two
concurrent fatal notifications for the same adapter could both pass
the "still installed" check and call disconnect() twice, which is
where the reported "'NoneType' object has no attribute 'updater'"
originates once self._app is cleared by the first call.
- Reassign the chained retry task to self._polling_error_task so the
guard reflects an in-flight recovery.
- Capture self._app in a local variable across the stop/start_polling
sequence instead of re-reading self._app between awaits.
- Claim (pop) the adapter from self.adapters before awaiting
disconnect() in _handle_adapter_fatal_error, not after, closing the
TOCTOU window for a concurrent notification on the same adapter.
Path(raw).name reduces '..'/'.'/'' to themselves, so basename
extraction alone still let a Graph-provided display_name of '..' or
'../' escape the temp recording directory (tmp_dir / '..' resolves to
the parent). Reject the dot-only basenames explicitly and fall back to
the artifact id. Extends @outsourc-e's regression coverage with the
dot-only cases.