Follow-up on the salvaged #49830 hardening. The contributor's sensitive
query-param set included bare English words (code, key, auth, session,
sig) that double as ordinary page facets — ?code= on promo/challenge
pages, ?key= as a search facet, ?session= on blogs — so web_extract and
cloud browser_navigate would refuse a large slice of normal browsing.
Narrow the set to unambiguously credential-named params (access_token,
authorization, client_secret, password, token, x-amz-signature, ...).
Prefix-based vendor-key redaction (is_safe_url) still catches recognizable
key shapes; this set is the belt-and-suspenders for opaque secrets carried
under an explicit credential-named parameter.
Also fixes two intra-PR-staleness test breakages surfaced by salvaging onto
current main:
- web_extract_tool() no longer accepts use_llm_processing= (signature
changed since the PR was authored) — dropped the invalid kwarg.
- agent.redact now fully masks keyed 'token=<secret>' to 'token=***'
instead of partial 'sk-...'; the console-redaction test now asserts the
real invariant (secret body gone) rather than the exact mask format.
Added a regression test that generic English-word query params are NOT
blocked by the credential guard.
Add policy gates and output redaction for browser/CDP surfaces, strengthen session ownership tracking, and block credential-like query parameters before third-party browser/web backends receive URLs.
Inspired by the agbrowse review: keep local browser magic-link flows possible while preventing cloud reader/browser escalation from receiving opaque token, code, signature, or key query parameters.
The persist user-message override was applied in place to the live messages
list. On the early crash-resilience persist (which runs BEFORE api_messages is
built), that stripped observed group-chat context off the live user message and
silently dropped it when observe_unmentioned_group_messages was enabled.
Fix at the single chokepoint: _flush_messages_to_session_db resolves the
override (idx/content/timestamp) locally and applies it ONLY to the row written
to the DB — the live dict is never mutated, so EVERY persist caller (early
persist, mid tool-loop flush, /resume, /branch) is protected uniformly. This
supersedes the earlier shallow-copy approach, which broke the intrinsic
_DB_PERSISTED_MARKER idempotency (copies never propagated the marker back to
the live dicts → duplicate rows) and closes the sibling class tracked in #56303.
Trailing empty-response scaffolding is still dropped from the live list in
_persist_session (unchanged behavior).
Salvaged from #48817; chokepoint reworked to coexist with the marker-based
dedup (#50372).
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The salvaged PR guarded only resolve_nous_access_token; the primary
resolve_nous_runtime_credentials path also POSTs the refresh token to
portal_base_url on refresh with no allowlist check. Mirror the guard
there so a poisoned host can't receive the bearer, and drop the stray
duplicated allowlist comment. Adds a sibling-site regression test.
Add tmp_path symlink regression tests for both generate_systemd_unit and
generate_launchd_plist (~/.local/bin/node -> profile node install must not
leak the profile target into the generated unit PATH). Register
jearnest11's AUTHOR_MAP entry for the salvage cherry-pick.
A user who tapped Always on an approval button gets a pairing-store entry.
_is_user_authorized() checked the pairing store BEFORE the allowlist and
returned True unconditionally, so a paired-but-not-allowed user permanently
bypassed TELEGRAM_ALLOWED_USERS (or equivalent) even after being removed from
the allowlist (#23778).
Record pairing membership but only honor it in the no-allowlist branch. When
an allowlist IS configured, the paired user must appear in the canonical
allowed_ids set (the same set that resolves WhatsApp aliases, SimpleX names,
group allowlists, and the '*' wildcard), so pairing grants no extra access.
Cherry-picked/rebased from #47736 (#23805) by ygd58; membership check rewritten
to reuse the existing allowlist logic. Adds regression tests.
The MCP serve event bridge polls two files to decide whether there is new
conversation activity to surface to MCP clients: the gateway sessions.json
index and state.db. Its skip-when-unchanged guard was self-defeating — it
refreshed self._sessions_json_mtime with the current value *before*
comparing against it, so the sessions.json term was always true and the
guard collapsed to a state.db-only check.
The impact is silent message loss on the event stream. The gateway commonly
persists a message to state.db on one tick and registers the owning
conversation in sessions.json a moment later. On that later tick only
sessions.json has changed, so the broken guard takes the early return and
never processes the freshly-registered chat. Its messages are withheld from
every connected MCP client (events_poll / events_wait) until state.db
happens to change again — which, for an otherwise-idle conversation, may be
never. A polling bridge that quietly swallows new conversations is exactly
the failure mode this watcher exists to prevent.
The fix is minimal and low-risk: capture the previously-seen sessions.json
mtime before the cache refresh and compare against that, so the guard skips
only when NEITHER file changed since the last poll. The hot-path mtime
optimization is fully preserved (a genuinely idle tick still short-circuits),
and all existing EventBridge polling tests continue to pass unchanged.
## What does this PR do?
Fixes a logic error in `EventBridge._poll_once` (`mcp_serve.py`) where the
"nothing changed, skip this poll" guard compared `sj_mtime` against
`self._sessions_json_mtime` *after* that attribute had already been
overwritten with `sj_mtime`. The comparison was therefore always true,
reducing the intended "skip only if both files are unchanged" check to a
state.db-only check and discarding any tick in which only sessions.json
changed. The guard now compares against the mtime observed on the previous
poll, restoring the intended behavior.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)
## Changes Made
- `mcp_serve.py`: in `EventBridge._poll_once`, snapshot
`prev_sessions_json_mtime = self._sessions_json_mtime` before refreshing the
cached index, and use it in the skip guard
(`sj_mtime == prev_sessions_json_mtime`) so a sessions.json-only change no
longer triggers the early return. Added a comment explaining the seam.
- `tests/test_mcp_serve.py`: added
`TestEventBridgePollE2E::test_poll_picks_up_new_conversation_when_only_sessions_json_changed`,
a regression test that reproduces the boundary state (state.db unchanged,
sessions.json newly updated) and asserts the new conversation's message is
emitted.
## How to Test
1. Reproduce the failure on the old code: with the guard comparing against
`self._sessions_json_mtime`, the new test fails — the freshly-registered
conversation yields `0` events instead of `1`.
2. Apply the fix and run `pytest tests/test_mcp_serve.py -q` — all 46 tests
pass (40 skipped require the optional `mcp` SDK), including the three
pre-existing `TestEventBridgePollE2E` polling tests and the new regression
guard.
3. `ruff check mcp_serve.py tests/test_mcp_serve.py` and
`python scripts/check-windows-footguns.py mcp_serve.py` both report clean.
## Checklist
### Code
- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/test_mcp_serve.py -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
`pathlib.Path('~user').expanduser()` raises RuntimeError when the
tilde-expansion can't resolve the user (e.g. `~500-700` where the LLM
meant "approximately 500-700" rather than a path). The hint walker's
existing `except (OSError, ValueError):` clauses do not catch
RuntimeError, so it escapes through the tool dispatcher and surfaces
in the conversation loop as a misleading
Error during OpenAI-compatible API call #N:
Could not determine home directory.
Reproduced across three unrelated models (openai/gpt-5-mini,
openai/gpt-5.1-codex, deepseek/deepseek-v4-flash) on terminal-tool
commands containing literal tildes in non-path contexts — common in
LLM output ("~500 agencies", "~45,000 CVEs", "~80/hr blended rate").
Reproduction (one-liner):
>>> from pathlib import Path
>>> Path("~500-700").expanduser()
RuntimeError: Could not determine home directory.
Fix: extend the three `except` clauses in
agent/subdirectory_hints.py to also catch RuntimeError:
line 138 (_add_path_candidate's outer catch around the Path().expanduser() call)
lines 198+202 (_load_hints_for_directory's nested catches around hint_path.relative_to(Path.home()))
Tests: tests/agent/test_subdirectory_hints_tilde.py adds three cases
covering: tilde-as-approximately in heredoc commands, ~unknown_user paths,
and a regression guard that legitimate ~/path expansion still works.
Root cause: gateway spawns LSP servers (jdtls/pyright/yaml-ls) and
slash_worker without start_new_session=True, so they inherit the
gateway process group (= TUI parent PID). When mcp_tool
_snapshot_child_pids() races with these spawns during stdio MCP
server startup, non-MCP children leak into _stdio_pgids with the
TUI parent PGID. shutdown_mcp_servers() then killpg(tui_parent_pid,
SIGTERM), killing the TUI itself.
Evidence: tui_gateway_crash.log shows recurring SIGTERM stacks:
shutdown_mcp_servers -> _kill_orphaned_mcp_children ->
_send_signal -> killpg(pgid, sig) -> SIGTERM received
Fix (3 layers):
1. agent/lsp/client.py: add start_new_session=True to LSP server
spawn so each LSP server gets its own process group/session.
2. tui_gateway/server.py: same fix for slash_worker spawn, the
symmetric root-cause patch so no gateway direct child shares
the TUI parent pgid.
3. tools/mcp_tool.py: add _filter_mcp_children() defense-in-depth
that drops non-MCP children (slash_worker, jdtls/eclipse LSP)
from the PID delta before they can poison _stdio_pgids.
Two tests for the auto-resume authorization gate: an unauthorized session
owner is skipped without claiming a _running_agents slot or persisting one,
and a raising auth check fails closed (session skipped, not resumed).
git's and sudo's option parsers resolve unambiguous long-flag prefixes, so
`git reset --har`, `git branch --delete --force`, and `sudo --stdi`/`--ask`
execute identically to their full-flag forms while evading the exact-string
DANGEROUS_PATTERNS regexes that gate them. Verified live against real git
and sudo binaries. Widen the patterns to accept unambiguous abbreviations,
scoped narrowly enough to avoid colliding with sibling flags (--help,
--soft/--mixed/--merge/--keep, --shell/--set-home).
Follow-up widening the archived-history fix to the sibling save paths the
original PR did not cover. Model switches (_cmd_model, set_session_model) and
_restore mint a fresh AIAgent with _session_db_created=False, so the
agent-owns-persistence guard evaluates False and the blind full-history
replace_messages() fired — DELETEing the durable active=0/compacted=1 rows on
any compressed ACP session (same data-loss class the PR fixes, different
trigger).
- hermes_state.replace_messages: add active_only=True to delete/reinsert only
the live (active=1) rows, leaving soft-archived rows untouched (idea adopted
from the competing PR #50306 by @mrparker0980, credited).
- hermes_state.has_archived_messages: cheap existence probe for active=0 rows.
- acp_adapter._persist: when the agent doesn't own persistence but the session
already has archived rows on disk, replace active-only; otherwise the
destructive full replace stays (fresh create/fork has nothing to lose).
- Regression test: model-switch save on a compacted session keeps the archived
turn discoverable via get_messages(include_inactive=True) + search_messages.
ACP's SessionManager._persist() called db.replace_messages() on every
save. That delete-then-reinsert is destructive by design. The agent
backing each ACP session already persists to the same SessionDB itself:
it flushes turns incrementally via append_message and, on context
compression, preserves pre-compaction turns non-destructively through
archive_and_compact() as searchable active=0/compacted=1 rows.
So the per-save replace_messages() was a redundant double-write that
deleted exactly those archived rows (and their FTS entries). Worse,
after a compression-driven id rotation the agent's live head no longer
equals the ACP session id, so the replace overwrote the ended parent
transcript while new turns flowed to the new id — split-brain corruption
of one conversation. Any ACP conversation (VS Code / Zed / JetBrains)
long enough to compress lost history.
Now _persist skips the destructive replace when the agent owns
persistence to this DB (its _session_db is this db and its row exists),
relying on the agent's own incremental + archival flush. It still falls
back to the atomic replace when the agent is not self-persisting — test
agent factories, and fresh create/fork sessions whose copied history the
agent has not flushed yet — so the #13675 rollback guarantee holds.
## What does this PR do?
Fixes silent history loss in ACP editor sessions. ACP _persist no longer
destroys the compression-archived transcript the agent already wrote.
Long enough conversations compress; that compression archives old turns
non-destructively; ACP then hard-deleted them on the next save. After an
id rotation it also clobbered the ended parent and split the
conversation across two ids. This change defers to the agent's own
persistence when it owns the DB and only uses the destructive replace
when nothing else is writing the transcript.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)
## Changes Made
- `acp_adapter/session.py`: in `SessionManager._persist`, guard the
`db.replace_messages()` call. Skip it when the agent owns persistence
to this DB (`agent._session_db is db` and `agent._session_db_created`);
otherwise keep the destructive atomic replace as the fallback.
- `tests/acp/test_session.py`: add a regression test proving archived
(active=0/compacted=1) rows survive a save when the agent self-persists
and stay FTS-searchable; add a test confirming the replace path still
runs for agents that do not own DB persistence.
## How to Test
1. Run `pytest tests/acp/test_session.py -q` — 43 pass.
2. `test_save_session_preserves_agent_archived_history`: archive a turn
via `archive_and_compact`, save, and confirm it survives and is found
by `search_messages` (fails before this fix — replace_messages deleted
it).
3. `test_save_session_still_replaces_when_agent_not_self_persisting`:
confirm history still overwrites cleanly for non-self-persisting
agents.
## Checklist
### Code
- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Rework follow-up on the Windows destructive-shell detection. The PowerShell
pattern required an explicit -Command/-c before the verb, but PowerShell runs
the verb as the DEFAULT POSITIONAL arg — so `powershell Remove-Item -Recurse
-Force C:\x` (no -Command) slipped through, the exact case the PR body claims
to close. Also missing the canonical `ri` alias.
Anchor the verb to the command position (after the shell name + any leading
-Flag switches + optional -Command/-c) so bare invocations are caught while a
benign path arg containing 'del'/'rm' (e.g. -File c:\del-logs\run.ps1) is not.
Add ri to the verb list. Mutation-verified regression tests for the bare
invocation, ri alias, and the benign-path negative.
Register the Matrix room-message, reaction, and invite handlers with
mautrix's wait_sync=True. mautrix's handle_sync() only returns the tasks
for handlers registered as sync-awaited; non-waited handlers are
fire-and-forget via background_task.create() and are NOT returned. Since
_dispatch_sync() awaits only the returned tasks (await asyncio.gather),
the inbound handlers previously had no completion point, so Tuwunel/
mautrix homeservers connected and completed initial sync but dispatched
zero inbound messages.
Fixes#46142.
Co-authored-by: Zeheng Huang <153708448+hunjaiboy@users.noreply.github.com>
Follow-up correcting the salvaged fix's persistence approach to avoid a
duplicate user-message write (verified via E2E — the #860/#42039 bug class
the original diff aimed to avoid).
Root cause: in gateway mode the AIAgent is built WITH a session_db, so the
inbound user turn is already flushed at turn start (turn_context.
_persist_session). The original fix returned agent_persisted=False, making the
gateway re-write the whole new-message slice via append_to_transcript ->
append_message (a raw INSERT with no dedup), duplicating the already-flushed
user turn.
Corrected approach (single writer): run_codex_app_server_turn now flushes its
OWN projected assistant/tool messages via _flush_messages_to_session_db (which
dedups the already-persisted user turn through _DB_PERSISTED_MARKER) and
returns agent_persisted=True so the gateway skips its write. Net result:
session_search/distill see the full codex conversation, each message persisted
exactly once.
Adds regression coverage asserting exactly-once persistence on a real
SessionDB, agent_persisted=True, FTS visibility, and standard-runtime skip-db
behaviour preserved.
Co-authored-by: Lubos Buracinsky <lubos@komfi.health>
compress_context() and /new already flush un-persisted messages before
calling end_session() (fixed in #47202), but /resume and /branch still
call end_session() directly. When a turn is interrupted mid-flight and
the user immediately runs /resume or /branch, messages generated during
that turn have not yet been written to state.db and are silently lost on
session rotation.
Add the same best-effort _flush_messages_to_session_db() call before
end_session() in both _handle_resume_command and _handle_branch_command,
mirroring the pattern established in cli.py:new_session().
Regression tests verify the flush is called when an agent is present.
Phase 2c review flagged that only 2 of the 4 structurally-identical
resolve_provider_client routing dead-ends were demoted. Complete the bug-class:
also demote+dedup the external-process ('not directly supported') and OAuth
('not directly supported, try auto') fall-throughs, keyed by provider name, so
none of the four dead-ends spam WARNING on a retry loop.
Add direct tests for the unhandled-auth_type and OAuth dedup paths via a
monkeypatched PROVIDER_REGISTRY (the review noted these were unverified).
Mutation-checked: reverting either sibling demotion fails its test.
The two fall-through branches in resolve_provider_client (unknown provider,
unhandled auth_type) logged at WARNING on every retry of a misconfigured
provider, spamming logs during retry loops. Demote both to logger.debug with
per-process dedup: the first occurrence still surfaces (a provider-name typo or
PROVIDER_REGISTRY/auth_type-drift bug is worth seeing once), while identical
repeats are suppressed for the process lifetime.
Salvaged from #56283 (extracting only the stated auxiliary_client fix; the
original PR also bundled ~2800 lines of unrelated changes across 10 other
files, which are dropped).
Reworks @valenteff's #53277 fix per review (Teknium's 3 findings):
- Route refresh_launchd_plist_if_needed's bootstrap through the existing
_launchctl_bootstrap() EIO-recovery helper (canonical since #56256),
wrapped in a wall-clock retry loop, instead of an ad-hoc 5x2s loop.
- Window sized to agent.restart_drain_timeout (default 180s), not a fixed
~10s: the failure happens while the old gateway is still draining (finding 1).
- Retry on subprocess.TimeoutExpired too, not just CalledProcessError — a
bootstrap timeout after bootout otherwise escapes and leaves the service
unloaded (finding 2).
- Confirm success with launchctl list, not a bare bootstrap exit 0 (finding 3);
mirror verify+drain-window in the detached-helper bash path.
- Shared helpers _launchd_reload_log_path / _append_launchd_reload_log /
_launchctl_label_registered / _retry_launchctl_bootstrap_until_registered.
3 new tests cover retry-until-listed, TimeoutExpired-retried, deadline-exhaust.
E2E: real reload log + mocked launchctl — retries CalledProcessError+TimeoutExpired,
verifies via launchctl list, logs failures.
Think-enabled models (MiniMax M2.7, DeepSeek, etc.) emit inline
<think>...</think> reasoning even for simple prompts like title
generation, and the raw XML was leaking into session titles. Route the
title-model response through the canonical strip_think_blocks scrubber
before cleanup so every tag variant — closed pairs, unterminated blocks,
orphan closes, mixed case — is handled, not just a single literal
<think> pair.
- 2 regression tests: closed <think> pair stripped, unterminated block
at start yields no title.
Salvaged from PR #44126 by @shawchanshek.
Hardens the salvaged #53997 tests per review: the positive-resolution and
reconnect-recovery tests now assert query_keys is awaited with the REAL
resolved device id ({mxid: [<id>]}) and never [None] — the [null] body the
homeserver rejects (the actual bug), plus await_count==2 to prove
verification genuinely re-runs after resolution rather than just the flag
looking right.
Per review feedback on #53997 from @teknium1: the flag was set True
on failed device_id resolution but never reset, so a same-adapter
reconnect that successfully resolves a real device_id would keep
skipping server-side key verification indefinitely.
Reset now happens at the top of connect(), before resolution runs,
so every connect() attempt starts clean. A repeat failure re-sets
the flag (unchanged behavior); a recovery correctly clears it.
Adds TestDeviceIdRecoveryOnReconnect to cover the transition.
- Resolve device_id via query_keys({mxid: []}) when whoami() returns None
- Guard _verify_device_keys_on_server and _reverify_keys_after_upload
against None/unverified device_id to prevent 'device_keys values must
be a list of strings' serialization failure
- Disconnect existing client before reconnect to prevent dual OlmMachine
instances on the same crypto store
Re-targeted from #39779 (legacy gateway/platforms/matrix.py) onto the
migrated plugins/platforms/matrix/adapter.py path following the
2026-06-20 adapter migration. Logic unchanged from original fix.
242 tests passing (233 upstream + 9 new).
/health/detailed leaked runtime state (gateway state, connected
platforms, active-agent counts, PID, exit reason) with no auth. Gate it
behind the same Bearer auth as other API routes; plain /health stays
open for liveness probes.
Also refuse to start on a placeholder/too-short (<16 char) API_SERVER_KEY
regardless of bind address — a guessable key on a terminal-capable
endpoint is RCE-adjacent even on loopback, since any local process can
reach it. The required-key check was already unconditional; this extends
the strength floor to loopback binds too. Startup guards are hoisted
above app/background-task creation so a rejected start leaves no partial
state.
Salvaged from #44073 (external-surface hardening), split into a focused
PR per maintainer request.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
The execute_code sandbox exposed its tool-call RPC (AF_UNIX socket and
remote file-poll transports) without any caller check, so any local
process that could reach the socket / rpc dir could dispatch
terminal-capable tool calls through the parent. Mint a per-session
HERMES_RPC_TOKEN, pass it to the sandboxed child, and require a
timing-safe match on every request in both _rpc_server_loop and
_rpc_poll_loop. Empty/missing/wrong token fails closed.
Salvaged from #44073 (per-session RPC token). Added timing-safe
secrets.compare_digest comparison and fail-closed regression tests.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
MoA full-turn traces (moa.save_traces) recorded the aggregator's acting
output only on the non-streaming path, where it's captured inline at
call time. On the streaming path — which every hermes chat --query run
and every live gateway/CLI turn takes — the aggregator's raw token
stream is handed to the live consumer, so the trace left output=null and
only pointed at the session-db assistant row. An offline audit of a
benchmark run (HermesBench drives --query) then couldn't see what the
aggregator produced without hand-joining to state.db.
Capture the resolved streamed acting text at trace-flush time (the agent
already holds it in _current_streamed_assistant_text) and fold it into
the trace, so the record is self-contained in both modes. New
output_location value inline_from_stream marks a streamed turn whose text
was captured this way; a genuinely empty acting turn (pure tool call)
still points at the session db, matching state.db exactly.
Touches only the trace side-channel — no change to the acting path,
message history, role alternation, or prompt cache.
- agent/moa_loop.py: consume_and_save_trace(..., aggregator_output_fallback)
on both the facade and the MoAClient wrapper; prefer inline capture,
fall back to the resolved streamed text.
- agent/moa_trace.py: embed the fallback; add inline_from_stream location.
- agent/conversation_loop.py: pass _current_streamed_assistant_text at flush.
- tests: 5 cases across streaming / non-streaming / empty-fallback / no-double-write.
Follow-up on the salvaged #47491 commits:
- Register _plugin_api_runtime_gate BEFORE the auth middlewares so it
executes AFTER them, and add an explicit auth check: unauthenticated
requests to /api/plugins/<name>/ fall through to auth's 401 instead of
this gate's 404. Prevents the gate from becoming a plugin-name oracle
(an unauthenticated caller could otherwise fingerprint installed/enabled
plugins by status code). Keeps test_non_kanban_plugin_route_requires_auth
green.
- Enable the 'example' user plugin in the _install_example_plugin test
fixture so the auth / static-asset-allowlist tests still reach the real
serving paths now that user plugins are gated on plugins.enabled.
- Mark the runtime-gate unit-test scopes as authenticated so they exercise
the enabled/disabled policy under the new auth-first ordering.
Address two residual bypasses identified in review:
1. Add _plugin_api_runtime_gate middleware that checks plugins.enabled/
plugins.disabled on every request to /api/plugins/{name}/... routes.
Previously, disabling a plugin at runtime had no effect on its already-
mounted API routes until a restart.
2. Extend serve_plugin_asset to check plugins.disabled for bundled plugins.
Previously, only user plugins were gated — a bundled plugin in
plugins.disabled would still serve assets from the unauthenticated
/dashboard-plugins/{name}/... endpoint.
Both fixes ensure the enabled/disabled policy is evaluated live at request
time, not just at startup.
Adds regression tests covering:
- Middleware blocks disabled user plugin API routes (404)
- Middleware blocks user plugin removed from enabled set (404)
- Middleware passes enabled user plugin API routes
- Middleware blocks disabled bundled plugin API routes (404)
- Bundled plugin assets return 404 when disabled
- Bundled plugin assets served normally when not disabled
- User plugin asset gating still works correctly
Slack Workflow Builder posts (and other app/bot messages) arrive as
subtype=bot_message with user=None. _is_user_authorized rejected them at
the `if not user_id: return False` guard, which runs *before* the #4466
{PLATFORM}_ALLOW_BOTS bypass — so @mentioning the bot from a Slack
workflow silently did nothing, even with SLACK_ALLOW_BOTS (or
SLACK_ALLOW_ALL_USERS) set. The chat-scoped allowlist for Telegram/QQ
already runs before that guard for the same reason (channel broadcasts
with no from_user); Slack was both missing from the bot-bypass map and
had the bypass running too late.
- gateway/authz_mixin: move the {PLATFORM}_ALLOW_BOTS bypass ahead of the
no-user-id guard and add Platform.SLACK -> SLACK_ALLOW_BOTS.
- plugins/platforms/slack/adapter: set is_bot=True on inbound
bot_message events so the gateway can identify workflow/app senders
(they carry no user_id to match against the allowlist).
Tested: new tests/gateway/test_slack_bot_auth_bypass.py plus the existing
Discord/Feishu bot-auth and gateway authz/gating suites all pass.
Follow-up on the salvaged resume_pending fix: the empty-turn safety net
now emits the same reason-aware recovery note as the _is_resume_pending
branch (reason phrase + 'session restored' guidance + no-re-execute
instruction) instead of a second, differently-worded note. Also adds the
AUTHOR_MAP entry for the salvaged commit.
A session interrupted by a gateway restart is flagged resume_pending and
auto-continued on startup via _schedule_resume_pending_sessions(), which
dispatches an empty-text internal MessageEvent. The recovery system note
that should fill that empty turn is gated, in _run_agent(), on
_interruption_is_fresh — the age of the LAST PERSISTED TRANSCRIPT ROW.
For an active thread returned to after >1h of silence, that transcript
clock is stale even though the interruption (last_resume_marked_at) is
seconds old. The gate evaluates False, the note is not prepended, and the
model receives a genuinely blank user turn — replying with confused
'that message came through blank' noise.
Fix (two parts, both default-on, behavior unchanged for healthy turns):
1. resume_pending freshness now also considers last_resume_marked_at (the
restart watchdog's own stamp). The branch fires when EITHER the
transcript clock OR the resume mark is fresh, so the startup scheduler's
freshness decision and the per-turn injection agree.
2. Empty-turn safety net: if the user turn is still blank after all
injections AND the session is resume_pending, backfill a recovery note
so a blank turn can never reach the model. Scoped to resume_pending so
ordinary empty turns (e.g. uncaptioned image) are untouched.
Adds 3 regression tests; the two core ones fail on the pre-fix logic.
Follow-up to the salvaged fail-closed defaults. The own-policy default flip
(open -> pairing) and the email dispatch-level deny broke sibling tests
across the suite that relied on the old fail-open behavior:
- test_email.py: dispatch-mechanics tests now opt into EMAIL_ALLOW_ALL_USERS
(they test formatting/attachments/threading, not authz); the two auth
contract tests are rewritten to assert the new fail-closed behavior
(no allowlist + no allow-all => sender dropped at the adapter).
- test_whatsapp_cloud.py / test_whatsapp_formatting.py / test_whatsapp_from_owner.py:
autouse fixture opts into WHATSAPP_ALLOW_ALL_USERS so dm_policy: open
dispatch-mechanics tests still flow (open now requires an explicit
allow-all opt-in, SECURITY.md 2.6).
- _adapter_for_source: use getattr for source.platform/profile so bare
SimpleNamespace test fixtures without .profile don't crash the busy/queue
ingress path (AGENTS.md pitfall #17).
Full tests/gateway/ + yuanbao pipeline: 8555 passed, 0 failed.
Aligns runtime behaviour with SECURITY.md 2.6: externally reachable
messaging adapters must fail closed unless access is explicitly
configured. Closes the confirmed multiplex authorization bypass a
secondary profile's open dm/group policy no longer inherits the default
profile's allowlist trust.
- Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default
dm_policy/group_policy to pairing/allowlist instead of open; open now
requires an explicit GATEWAY_ALLOW_ALL_USERS or per-platform allow-all.
- Startup guard (_own_policy_open_startup_violation) refuses to boot when
an enabled adapter is open without the allow-all opt-in; the guard now
runs for every secondary profile in multiplex mode too.
- Profile-aware own-policy authorization: _authorization_adapter /
_adapter_for_source resolve the live adapter via SessionSource.profile,
so _is_user_authorized and the ingress/pairing/busy/queue paths read the
originating profile's adapter policy, not the default profile's.
- Fail-closed intake for Email, Feishu P2P, and Discord (blank-principal
denial, empty-allowlist deny, missing-interaction.user deny).
Salvaged from #44073 (external-surface hardening), split into a focused
gateway-authz PR per maintainer request. Follow-up fix by Hermes Agent:
the Discord slash-auth channel bypass now matches DISCORD_ALLOWED_CHANNELS
by the same name-inclusive keys (id + name + #name + parent) the on_message
scope gate uses, so a name-form channel allowlist authorizes slash
interactions consistently (was id-only, breaking #name matching).
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Discord's _fetch_channel_context backfills recent channel/thread activity
(from any member who can post there, not just the allowlisted user) into
the agent's context with no sender-trust distinction. Slack's equivalent
_fetch_thread_context was fixed to prefix non-allowlisted senders with
[unverified] and add LLM guidance not to act on their content, mitigating
indirect prompt injection from third parties in shared channels/threads.
Port the same mechanism to Discord using the already-wired
_is_sender_authorized/set_authorization_check plumbing.
Phase 2c mutation-check found the salvaged tests covered only the pure helpers
(_is_background_review_harness_message / _strip_background_review_harness) — the
two integration WIRINGS had zero coverage: removing the _persist_disabled guard
in _flush_messages_to_session_db, or the _strip call in
get_messages_as_conversation, left all 13 tests green.
Add:
- TestPersistDisabledHardStop: a _persist_disabled agent's flush writes nothing
to a live SessionDB (guards the run_agent hard-stop).
- TestGetMessagesAsConversationStripsHarness: a session with stray harness rows
resumes clean end-to-end through get_messages_as_conversation (guards the
hermes_state load-time wiring).
Mutation-checked: each new test fails when its wiring is reverted.
The forked skill/memory review agent shares the parent's session_id for
prompt-cache warmth. Without isolation it wrote its harness turn ('Review the
conversation above and update the skill library…') plus its curator-mode reply
straight into the user's REAL session in state.db; the next live turn re-read
that injected user message as a standing instruction and the agent 'became' the
curator, refusing the actual task.
Root fix: a _persist_disabled flag on the fork that hard-stops every DB write
and lazy-open path (_flush_messages_to_session_db, _ensure_db_session,
_get_session_db_for_recall) — the review writes only to the skill/memory stores
via its tools. Defense-in-depth: _strip_background_review_harness drops any
stray harness message (and the assistant reply that followed) at load time in
get_messages_as_conversation, so an already-polluted session resumes clean.
Salvaged from #50296.
Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>
The standalone thread-pool fallback in _deliver_result() runs inside the
`except RuntimeError:` block (taken when asyncio.run() sees a running loop).
When future.result() raised there (SMTP ConnectionError, timeout, etc.), the
exception was NOT caught by the sibling `except Exception:` — it escaped
_deliver_result() and crashed the whole delivery loop, silently skipping every
remaining target. Multi-target delivery (e.g. deliver: 'email:a,email:b') is a
documented feature, so this broke a promised contract.
Wrap the fallback in its own try/except so a per-target failure is logged with
exc_info and the loop continues to the next target.
Fixes#47163
Phase 2c review follow-up on the id()-reuse persistence fix:
- test_recycled_id_in_dedup_set_still_persists_new_message seeded an EMPTY
dedup set, so it never injected a collision and passed under id-based dedup
too (couldn't distinguish the designs). Replace with
test_stale_seed_id_from_prior_flush_cannot_suppress_new_message, which asserts
the durable invariant: the seed is empty after every flush (mutation-checked:
removing the post-flush reset now fails BOTH id-reuse tests).
- Refresh the _flush_messages_to_session_db docstring: it still described the
old per-session identity tracking; document the intrinsic-marker mechanism,
that _flushed_db_message_ids is now a one-shot seed, and the shared-dict
mutation safety note.
_flush_messages_to_session_db deduped persisted messages with a retained
{id(msg)} set (_flushed_db_message_ids) kept across turns. Once a flushed dict
is dropped from the live list (scaffolding rewind / in-place compaction) and
GC'd, CPython recycles its address onto a new assistant/tool dict whose id()
collides with the stale entry — so the real turn is silently never written to
state.db.
Replace the retained id-set with an intrinsic _DB_PERSISTED_MARKER stamped on
each dict. The id-set is demoted to a one-shot seed (valid only while the
caller's objects are alive) that is translated to markers and cleared after
every flush, so no id() outlives a flush to alias a future message. The marker
is _-prefixed so the wire sanitizers strip it before any request leaves.
Preserves the existing _is_ephemeral_scaffolding skip. Salvaged from #50372.
Co-authored-by: rrevenanttt <290873280+rrevenanttt@users.noreply.github.com>
Review nit (helix4u): the fix covers 500/502/503/529 but the positive tests
only asserted 500 and 503. Parametrize over all four so 502/529 are covered
too; keep the plain-5xx negatives.
Local inference servers (llama.cpp/llama-server, vLLM/Ollama behind a
Cloudflare/Tailscale hop) report context overflow with HTTP 500/502/503/529
instead of 400/413. _classify_by_status returned server_error/overloaded and
retried blindly, then dropped the turn with no compaction. Route explicit
_CONTEXT_OVERFLOW_PATTERNS matches on those 5xx codes to context_overflow
(should_compress=True); plain 500 stays server_error, plain 503 overloaded.
hermes -w locks each worktree (reason 'hermes pid=<pid>'). git worktree
remove --force (single -f) refuses a locked tree, so a crashed session's
lock was never released and its worktree accumulated forever — a real
contributor to .worktrees/ bloat.
_prune_stale_worktrees now classifies each lock via _worktree_lock_is_live:
a live-owner pid is skipped at any age; a dead-owner (or foreign) lock is
unlocked first so the aggressive age-based cleanup can actually reap it.
The >72h reap tier is kept (that cleanup is intentional) but now guarded so
dirty/unpushed work is preserved, and branch deletion is gated on
git worktree remove succeeding. New fail-safe helpers _worktree_is_dirty
and _worktree_lock_is_live (pid liveness via gateway.status._pid_exists,
Windows-safe).
A transient Bot API network error during gateway bootstrap (deleteWebhook
or the initial start_polling) currently raises out of connect() and marks
the Telegram adapter fatal, restart-looping the whole gateway even though
the right behavior is to degrade the Telegram channel and let the existing
reconnect ladder recover in the background.
- _delete_webhook_best_effort(): swallow only transient network errors and
continue to polling; non-network errors (e.g. auth failures) still raise.
- _start_polling_resilient(): on a transient conflict/network error at
bootstrap, schedule background recovery and return degraded instead of
raising; non-transient errors still propagate.
- Track the polling error-callback recovery tasks in _background_tasks so
they can't be garbage-collected mid-flight.
- Add a second Telegram Bot API seed fallback IP (149.154.166.110).
Reconnect keeps its existing 10-retry -> supervisor-restart semantics; this
change only fixes the bootstrap raise, it does not alter the retry ladder.