Commit graph

1973 commits

Author SHA1 Message Date
Teknium
d57a4c197c
fix(tools): stop _strategy_exact emitting overlapping matches (#56211)
_strategy_exact advanced its scan cursor by pos+1 instead of
pos+len(pattern), so self-overlapping patterns (e.g. "aa" in "aaaa")
matched at overlapping offsets. _apply_replacements works in reverse
order, so the second replacement operated on already-modified content
using stale offsets — corrupting the file and reporting the wrong count
under replace_all=True. Advancing by len(pattern) matches str.replace()
semantics.
2026-07-01 02:13:13 -07:00
kshitijk4poor
a658f3b28b fix(security): strip dynamic Hermes secrets from all subprocess spawn env
Subprocesses spawned by the terminal tool, execute_code, Docker backend, and
the codex app-server could inherit Hermes-internal secrets that the name-based
`_HERMES_PROVIDER_ENV_BLOCKLIST` can't enumerate, because they're injected into
`os.environ` at runtime under dynamic names:

- `AUXILIARY_<TASK>_API_KEY` / `AUXILIARY_<TASK>_BASE_URL` — per-task side-LLM
  credentials bridged from `config.yaml[auxiliary]` by gateway/run.py and cli.py
  (vision, web_extract, approval, compression, plugin-registered tasks). Often
  separate, higher-spend keys plus base URLs pointing at private endpoints.
- `GATEWAY_RELAY_*_SECRET` / `_KEY` / `_TOKEN` — relay-auth material provisioned
  by gateway/relay.

Additionally, agent/transports/codex_app_server.py built its spawn env from a
raw `os.environ.copy()`, bypassing the centralized `hermes_subprocess_env()`
helper entirely — handing every codex subprocess the full Tier-1 secret set
(GH_TOKEN, gateway bot tokens, Modal/Daytona infra tokens, dashboard session
token) unfiltered. This is the #29157 sibling spawn-site gap; copilot_acp_client
already routes through the helper.

Fix — single chokepoint:
- Add `_is_hermes_internal_secret(key)` in tools/environments/local.py as the
  single source of truth for the dynamic secret patterns. Matches
  AUXILIARY_*_API_KEY / _BASE_URL and GATEWAY_RELAY_*_SECRET/_KEY/_TOKEN; leaves
  non-secret AUXILIARY_*_PROVIDER/_MODEL and GATEWAY_RELAY routing hints visible.
- Wire the predicate into every spawn path unconditionally (ignores skill
  env_passthrough opt-in AND inherit_credentials — a model-driving CLI never
  needs these): `_sanitize_subprocess_env` (both loops), `_make_run_env`
  (foreground), `hermes_subprocess_env` (Tier-1), and the Docker forward filter.
- Add the static GATEWAY_RELAY_* names to `_HERMES_PROVIDER_ENV_BLOCKLIST` so the
  exact-match path catches them independently of the predicate.
- Add the GATEWAY_RELAY_ID/_SECRET/_DELIVERY_KEY triplet to `_ALWAYS_STRIP_KEYS`
  (Tier-1) so it is stripped unconditionally on EVERY spawn surface — including
  the codex/copilot `inherit_credentials=True` path that skips the Tier-2
  blocklist. `_SECRET`/`_DELIVERY_KEY` are already predicate-matched; `_ID` has
  no secret suffix, so enumerating it here is what closes its leak on the
  inherit path (self-review W1).
- Defense in depth: env_passthrough.py `_is_hermes_provider_credential()` now
  consults the same predicate, so a skill can't register these names as
  passthrough and tunnel them into an execute_code / terminal child.
- Route codex_app_server through `hermes_subprocess_env(inherit_credentials=True)`
  — strips Tier-1 + dynamic-internal secrets while provider creds (which codex
  needs to authenticate) still flow.

Consolidates PRs #53715 (necoweb3 — the _is_hermes_internal_secret backbone +
Docker filter), #53503 (srojk34 — env_passthrough guard), and #55709 (srojk34 —
codex routing). Retires #52348 (claudlos): its copilot half is already on main,
and its codex half used the full-strip `_sanitize_subprocess_env` which would
break codex provider auth — the correct tier is `inherit_credentials=True`.

Tests: TestHermesInternalDynamicSecrets (terminal + predicate + passthrough
override), TestInternalDynamicSecrets (hermes_subprocess_env both tiers),
TestSpawnEnvSecretStripping (codex spawn env), plus env_passthrough
defense-in-depth cases.

Co-authored-by: necoweb3 <sswdarius@gmail.com>
Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>
Co-authored-by: claudlos <claudlos@agentmail.to>
2026-07-01 14:37:22 +05:30
Teknium
7534b5be2c
fix(security): anchor rm hardline rules to command position (#56193)
A literal "rm -rf /" carried as DATA inside another command's quoted
argument — a PR title, a git commit -m message, an echo/printf arg —
tripped the unconditional root-filesystem hardline and could not run at
all. `gh pr create --title "block rm -rf / spellings"` was blocked
outright, because the bare rm path branch matched the mid-string "rm"
(via \brm) with the space after "/" satisfying its (\s|$) terminator.

Anchor the shared _RM_FLAG_PREFIX to _CMDPOS so the rm hardline rules
fire only when rm is an actual command word (start of line, after a
separator ; && || |, after a subshell opener $()/backtick, or after
sudo/env/exec wrappers) — not when the string appears as an argument
value. Broaden the bare-path terminator to also accept shell
metacharacters ) ` ; | & so a real wipe inside a command substitution
is still caught.

The quoted-path branch is unchanged, so quoted root/HOME paths stay
blocked. Adds regression tests for both directions: data-arg false
positives must NOT block, real wipes at every command position must block.
2026-07-01 01:54:43 -07:00
claudlos
b24708eda0 security(cron): block base_url overrides that exfiltrate provider credentials
The model-facing cronjob tool accepts free-form provider + base_url. On fire,
the scheduler pairs the named provider's stored credential with the job's
base_url, so a prompt-injected job (e.g. provider=anthropic,
base_url=https://attacker/v1) sends the real API key to an attacker endpoint. A
base_url with no provider inherits the default provider's key for the same
effect.

Add a fail-closed guard at the tool boundary: a base_url override is allowed
only for the custom/BYOK sentinel, a configured custom_providers entry, or when
the override host matches the named provider's own endpoint; an override without
an explicit provider is rejected. The trust boundary is the caller, so
operator-configured base_urls for named providers are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 14:23:01 +05:30
necoweb3
dc8b5b4f47 fix(approval): detect encoding-based dangerous command bypass (#30100)
echo <base64> | base64 -d | bash (and base32/base16, xxd -r, tr
transforms, openssl base64/enc -d) decode a dangerous command at
runtime — the raw text carries no dangerous keyword, so the denylist
never fired. Adds DANGEROUS_PATTERNS entries for decode-and-execute
pipes into a shell.
2026-07-01 01:39:10 -07:00
YLChen-007
4b5fce66f5 fix(approval): flag remote content via command substitution (#26964)
eval $(curl ...), source $(wget ...), and . $(curl ...) executed
remote content but were not covered by the existing pipe-to-shell /
process-substitution patterns. Adds a DANGEROUS_PATTERNS entry so these
command-substitution forms consistently request approval.

Original authorship preserved from PR #26965 (bot-authored commit
re-attributed to the human contributor).
2026-07-01 01:39:10 -07:00
xy200303
1ebc56ca39 fix(approval): detect shell-expanded command names (#36846)
Command-name obfuscation bypassed the dangerous-command denylist: the
executable name could be spelled with shell tricks that survive regex
matching but still resolve to a blocked command at runtime —
$(echo rm), ${0/x/r}m, backticks, and printf substitutions.

Adds a non-executing shell-word scanner that deobfuscates only at
command positions (start, after ;|&&||, inside $(...), after
sudo/env/exec/... wrappers) and feeds the resulting variants through
the existing HARDLINE_PATTERNS / DANGEROUS_PATTERNS — no second
blocklist. Scoping to command words keeps ordinary arguments
(echo $(echo rm) -rf /) from being promoted into command names.

Co-authored-by: egilewski <1078345+egilewski@users.noreply.github.com>
2026-07-01 01:39:10 -07:00
teknium1
17f07aebdc fix(security): close shell line-continuation bypass in command detection
`_normalize_command_for_detection` strips backslash-escapes before matching
DANGEROUS_PATTERNS and HARDLINE_PATTERNS, but the strip rule was
`re.sub(r'\\([^\n])', r'\1', ...)` — its `[^\n]` class deliberately skips
newlines. A backslash immediately followed by a newline is a POSIX line
continuation: the shell removes BOTH characters and joins the tokens, so
`rm -rf \<newline>/` executes as `rm -rf /`. With the dangling backslash left
in place, the structured rm/dd/mkfs patterns no longer match because a literal
`\` sits wedged between the tokens they expect to be adjacent.

The worst consequence is on the HARDLINE floor. The dangerous-command layer
still fired here only by accident (the generic `\brm\s+-[^\s]*r` "recursive
delete" rule needs no path), and that layer is bypassed by `--yolo` /
`approvals.mode=off`. The hardline blocklist — the unconditional floor reserved
for catastrophic, unrecoverable commands and meant to hold even under yolo —
anchors the root path directly after the flags, so `rm -rf \<newline>/`,
`rm -r\<newline>f /`, and `rm -rf \<newline>~` all slipped past it entirely.
A yolo session could therefore wipe the root filesystem.

The fix collapses line continuations (`\` + `\n` or `\r\n`) to nothing,
mirroring the shell, before the existing escape strip runs. This was the gap
left by 621bf3a87, which added the escape strip but only for non-newline chars.

## What does this PR do?

Closes a shell line-continuation bypass in the dangerous-command detector.
Before: `rm -rf \<newline>/` normalized to `rm -rf \<newline>/`, so the
hardline root-delete patterns did not match and the command could run under
`--yolo`. After: line continuations are collapsed first, the command
normalizes to `rm -rf /`, and the hardline floor blocks it unconditionally.

## Related Issue

N/A

## Type of Change

- [x] 🔒 Security fix

## Changes Made

- `tools/approval.py`: in `_normalize_command_for_detection`, add
  `command = re.sub(r'\\\r?\n', '', command)` ahead of the existing
  backslash-escape strip so shell line continuations (`\`+newline, LF or CRLF)
  are removed exactly as the shell would, instead of leaving a stray backslash
  that breaks the structured patterns.
- `tests/tools/test_hardline_blocklist.py`: add a parametrized
  `test_hardline_blocks_line_continuation` covering the root, in-flag, home,
  CRLF, and mkfs continuation forms, plus
  `test_line_continuation_root_wipe_cannot_bypass_hardline` asserting the
  continuation root wipe stays blocked even with `HERMES_YOLO_MODE=1`.

## How to Test

1. Reproduce: stash the `tools/approval.py` change and run
   `scripts/run_tests.sh tests/tools/test_hardline_blocklist.py` — the new
   line-continuation cases fail (`rm -rf \<newline>/` is not flagged hardline,
   and leaks past the floor under yolo).
2. Restore the change and rerun the file — all 106 tests pass.
3. Regression: `scripts/run_tests.sh tests/tools/test_approval.py` (the
   existing fullwidth/ANSI/null-byte normalization and multiline cases still
   pass).

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5.0)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — handles both LF and CRLF line endings
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A

# Conflicts:
#	tools/approval.py
2026-07-01 01:38:59 -07:00
teknium1
1d8bd73414 fix(approval): treat # as comment boundary only when whitespace-preceded
The salvaged write-target boundary included `#` in its char class, so a
`#` glued to the redirect/tee path (`echo x > .env#backup`) matched as a
comment boundary and flagged the write as dangerous. But the shell writes
to the distinct file `.env#backup`, not `.env` — a false positive, same
class as the config.yaml.bak case the PR already excluded. Drop `#` from
the boundary; a real trailing comment is always whitespace-preceded (\\s).

Adds regression tests for .env#backup, config.yaml#backup, and
tee .env#backup staying out of the deny.
2026-07-01 01:27:26 -07:00
friendshipisover
7bfdc0bca6 fix(security): close env/config write-deny bypass via trailing arg or comment
The dangerous-command approval gate has rules that flag a shell command
when it overwrites a project `.env` or `config.yaml` — these files hold
API keys, DB passwords, and (for `config.yaml`) the approval policy
itself, so a write to them should require user approval. The matching
`write_file`/`patch` deny on the file-tools side was paired with these
terminal-side rules so neither path is an open door.

The redirection and `tee` rules anchored the sensitive path with
`_COMMAND_TAIL` (`(?:\s*(?:&&|\|\||;).*)?$`), which only tolerates the
rest of the line being empty or a command separator. The problem: in
POSIX shell the redirection target is fixed regardless of what trails it.
`echo secret > .env extra` still truncates `.env` (the `extra` is just
another argument to `echo`), and `echo secret > .env # note` does too
(the `#` starts a comment). Because neither tail is a separator, the old
anchor failed to match and the command sailed through approval — a
prompt-injected step could overwrite a project `.env`/`config.yaml`
unprompted. The system-path redirection rule one line above never had
this restriction and already caught these forms.

The fix introduces `_WRITE_TARGET_BOUNDARY`, a lookahead that only
requires the path token to END at a shell word boundary (whitespace,
quote, separator, redirection operator, `#`, or EOL) rather than
demanding the rest of the line be empty. It is applied to the two
stream-write rules (redirection and `tee`) where the sensitive path is
always a write target. The `cp`/`mv`/`install` rule deliberately keeps
`_COMMAND_TAIL`: there the sensitive file is only a target when it is the
LAST argument (the destination), so requiring end-of-line is correct and
keeps `cp config.yaml backup.yaml` (config.yaml as the source) out of the
deny.

## What does this PR do?

Closes a bypass in the dangerous-command approval gate where a trailing
argument or `#` comment after a `>`/`>>`/`tee` write target let a command
overwrite a project `.env` or `config.yaml` without triggering approval,
even though the shell still overwrites the file.

## Related Issue

N/A

## Type of Change

- [x] 🔒 Security fix

## Changes Made

- `tools/approval.py`: add `_WRITE_TARGET_BOUNDARY` (a word-boundary
  lookahead) and use it instead of `_COMMAND_TAIL` in the two
  project-env/config stream-write patterns ("overwrite project env/config
  via tee" and "via redirection"). `_COMMAND_TAIL` is kept and still used
  by the `cp`/`mv`/`install` rule, where end-of-line anchoring is the
  correct semantics.
- `tests/tools/test_approval.py`: add regression tests for
  `> .env extra`, `> .env # note`, `>> config.yaml foo`, and
  `tee .env backup` (now flagged), plus `> config.yaml.bak` (must stay
  safe — different file).

## How to Test

1. Reproduce: before the fix,
   `detect_dangerous_command("echo secret > .env extra")` returns
   `(False, None, None)` — the overwrite is not flagged.
2. Apply the fix; the same call now returns the "overwrite project
   env/config via redirection" detection.
3. Run `pytest tests/tools/test_approval.py -q` — the new cases pass and
   the existing `cp config.yaml backup.yaml` / `config.yaml.bak`
   false-positive guards still hold.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix
- [x] I've run the relevant tests and they pass
- [x] I've added tests for my changes
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, docs/, docstrings) — or N/A
- [x] I've updated cli-config.yaml.example if I added/changed config keys — or N/A
- [x] I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
2026-07-01 01:27:26 -07:00
kshitijk4poor
83ae65487e test(browser): cover guard-inactive + camofox short-circuit paths; fix blank lines
Review follow-up on the private-page action guard:
- Add test_guard_inactive_does_not_block_or_probe: when the SSRF guard is
  inactive (local backend / allow_private_urls), click/type/press must proceed
  WITHOUT probing the page URL. This is the branch most likely to silently
  regress if the guard condition is inverted; a mutation check (flipping the
  condition) confirms the test fails as designed.
- Add test_camofox_short_circuits_before_guard: camofox mode returns from the
  dedicated camofox_* path before the guard runs; guards never consulted.
- Fix PEP8: 3 -> 2 blank lines before _blocked_private_page_action.
2026-07-01 13:56:49 +05:30
dsad
3e4c138251 fix(browser): block private-page interactions after eval navigation 2026-07-01 13:56:49 +05:30
rrevenanttt
a81b519d41 fix(security): close hardline rm bypass via quoted paths and ${HOME}
## What does this PR do?

Closes a critical hole in the hardline command floor. HARDLINE_PATTERNS is
the unconditional last line of defense: detect_hardline_command runs BEFORE
every yolo / approvals.mode=off / cron approve-mode bypass, so it is the only
gate standing between the agent (or a prompt-injected instruction) and an
irrecoverable disk wipe. The three rm rules anchored on a bare path token,
and _normalize_command_for_detection never strips shell quotes — so the
ordinary, recommended shell idioms slipped straight through:

  rm -rf "/"        rm -rf '/'        rm -rf "/etc"
  rm -rf "$HOME"    rm -rf ${HOME}    rm -rf "${HOME}"

All of these returned NO hardline match. A leading quote pushes the path out
of reach of the flag group, a trailing quote breaks the `(\s|$)` terminator,
and the `${HOME}` brace form was never listed at all. Under --yolo,
approvals.mode=off, or cron approve-mode the dangerous-command layer is also
skipped, so these commands reached execution with zero gate — exactly the
unrecoverable data loss the floor is documented to make impossible. Because
quoting paths and `${HOME}` are normal shell usage, not exotic obfuscation,
this is a high-severity, easily-triggered bypass.

The fix makes the rm path matcher quote- and brace-tolerant while staying
conservative: a path is matched when it is either fully wrapped in its own
matching quote pair (`"/"`) or bare with a whitespace/end terminator. The
matching-quote requirement is deliberate so the change adds no new false
positives — a dangerous-looking string that is merely an argument to another
command (e.g. `git commit -m "rm -rf /"`) has a closing quote but no opening
quote of its own around the path, so neither branch fires.

## Related Issue

N/A

## Type of Change

- [x] 🔒 Security fix

## Changes Made

- `tools/approval.py`: added `_hardline_rm_path()` (matches a destructive
  path either fully quoted or bare-with-terminator), factored the protected
  system-dir list into `_HARDLINE_SYSTEM_DIRS` and the rm flag prefix into
  `_RM_FLAG_PREFIX`, and rebuilt the three rm `HARDLINE_PATTERNS` on top of
  them, adding the `${HOME}` brace form. Kept as plain concatenation so regex
  backslashes never land inside an f-string field (Python 3.11 floor).
- `tests/tools/test_hardline_blocklist.py`: added quoted (`"/"`, `'/'`,
  `"/etc"`, `"$HOME"`, ...) and brace (`${HOME}`, `"${HOME}"`) cases to the
  must-block set, a dedicated `_QUOTED_BRACE_BYPASS` regression parametrization,
  no-false-positive guards (`git commit -m "rm -rf /"`), and extended the
  yolo-cannot-bypass integration test to cover the quoted/brace forms.

## How to Test

1. Reproduce the bypass on `main`: `detect_hardline_command('rm -rf "/"')`
   returns `(False, None)` — the floor lets it through.
2. With this change it returns `(True, "recursive delete of root filesystem")`;
   the same holds for `'/'`, `"/etc"`, `"$HOME"`, `${HOME}`, `"${HOME}"`.
3. Run the suite: `scripts/run_tests.sh tests/tools/test_hardline_blocklist.py`
   — 125 passed, including the new bypass and no-false-positive cases.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (no unrelated commits)
- [x] I've run the relevant tests and they pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — pattern-only change, ruff + footgun gate pass
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
2026-07-01 01:25:24 -07:00
zapabob
500c2b1e46 fix(security): close SSRF redirect-guard bypass across all httpx download hooks
Inside httpx AsyncClient response event hooks, response.next_request is
often None even for a genuine redirect, so guards keyed on
`if response.is_redirect and response.next_request` silently never fire.
A public URL that 302s to http://169.254.169.254/ was followed anyway,
defeating the pre-flight is_safe_url() check.

Resolve the redirect target from the Location header (via urljoin, so
relative Locations work too), falling back to next_request only when no
Location is present. Extracted as tools.url_safety.redirect_target_from_response
and wired into every SSRF redirect guard:

  - gateway/platforms/base.py  (shared image + audio download for all platforms)
  - tools/vision_tools.py       (two download hooks)
  - plugins/platforms/slack/adapter.py

Original fix by @zapabob (PR #35940), which targeted the since-refactored
gateway/platforms/slack.py; reconstructed onto the current shared sites and
widened to the whole bug class.
2026-07-01 01:18:53 -07:00
kshitijk4poor
e09ff88d02 fix(browser): close remaining CDP-URL leak paths in supervisor (review)
Review of the salvage found the timeout-message redaction left the more
common failure mode unguarded: when the first websockets.connect(cdp_url)
fails (bad URI / refused / TLS), the raw websockets exception -- which
embeds the full cdp_url incl. ?token= and user:pass@ -- is stashed as
_start_error and re-raised verbatim by start(), and two reconnect
logger.warning sites log the same raw exception.

Add a module-level _redact_cdp_error_text() chokepoint (delegating to
agent.redact.redact_cdp_url) and route all four supervisor egress points
through it:
- start() TimeoutError message (already covered; kept)
- start() _start_error re-raise -> now raises a redacted RuntimeError with
  'from None' so no secret leaks via message OR traceback cause chain
- connect-failed and session-dropped reconnect warnings

Guard tests assert the re-raised message is redacted for both token and
userinfo, the raw cause is suppressed, and the helper preserves non-secret
context (host/reason). Verified with a mutation check: reverting to the raw
'raise err' fails the new tests. Correct the redact_cdp_url docstring to
scope its guarantee to direct-URL redaction and point exception callers at
the supervisor helper.
2026-07-01 13:43:58 +05:30
kshitijk4poor
c626dded13 refactor(redact): consolidate CDP-URL log redaction into one chokepoint
The session-log fix (browser_tool._sanitize_url_for_logs) and the
supervisor attach-timeout fix (CDPSupervisor.start) both composed the
same three redactors (redact_sensitive_text -> _redact_url_query_params
-> _redact_url_userinfo) to mask CDP endpoint credentials. Two copies of
one policy drift: tune one site (e.g. add fragment masking) and the other
silently re-leaks.

Promote that composition to a single public helper redact_cdp_url() in
agent/redact.py -- the one place the CDP-URL redaction policy lives -- and
route both call sites through it (_sanitize_url_for_logs becomes a thin
wrapper; the supervisor imports the helper instead of re-composing the
private redactors). Add direct unit tests for the seam covering query
tokens, multiple credentials, userinfo passwords, plain-URL passthrough,
non-string/exception coercion, and None.

No behavior change at the call sites; both leak paths remain closed.
2026-07-01 13:43:58 +05:30
srojk34
265da9cadb fix(browser): redact CDP URL token in _create_cdp_session log and supervisor timeout
PR #54851 added _sanitize_url_for_logs() and wired it into the three log
sites inside _resolve_cdp_override(). A fourth site was missed:
_create_cdp_session() logs the already-resolved cdp_url unconditionally,
and CDPSupervisor.start() interpolates the raw cdp_url[:80] into the
attach-timeout TimeoutError (which _ensure_cdp_supervisor() logs with %s).
Both leak query-string credentials (e.g. ?token=secret from hosted CDP
providers) into Hermes logs.

Sanitize the URL at both remaining sites. The raw URL is preserved
unmodified in the returned session dict and used for the real connection;
only the logged/error representation is redacted.

Salvaged from #55883.

Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>
2026-07-01 13:43:58 +05:30
Jace Nibarger
060779bb76 fix: bound threat-pattern/FTS5 regex input and cover V4A Move-File edits
Salvaged from PR #35130 (the safe subset of jnibarger01's security pass):

- threat_patterns.py: replace unbounded (?:\w+\s+)* filler with bounded
  {0,8} + cap scan input at MAX_SCAN_CHARS (64KiB), and bound the .*
  runs in the exfil/config-mod patterns. Kills catastrophic backtracking
  on adversarial near-misses.
- hermes_state.py: cap FTS5 query length (MAX_FTS5_QUERY_CHARS) and
  extract quoted phrases with a linear scan instead of a regex so
  pathological quote runs can't induce backtracking.
- acp_adapter/edit_approval.py + agent/tool_dispatch_helpers.py: recognize
  '*** Move File: src -> dst' V4A headers so patch-mode edits are
  permissioned/traversal-checked (previously only Update/Add/Delete), and
  surface a proposal for mode=patch V4A calls (previously replace-only).

Tests: +ReDoS-bound + FTS5-cap + Move-File-target + V4A-approval cases.
2026-07-01 01:05:28 -07:00
zapabob
8e492b5567 fix(file): block credential paths from search results 2026-07-01 01:02:35 -07:00
Matt Kotsenas
dd22c2f533 fix(mcp): preserve 'definitions' as a property name in tool schemas
The MCP input-schema normalizer in _normalize_mcp_input_schema promotes the
legacy JSON Schema 'definitions' meta-keyword to '$defs' (draft 2019-09+)
so local '$ref' resolution works downstream. The previous walk renamed
*any* key named 'definitions' anywhere in the tree, including inside
'properties' dicts. That turned user-facing parameter names into '$defs',
producing property keys that contain '$', which Anthropic and OpenAI
both reject with HTTP 400 (pattern '^[a-zA-Z0-9_.-]{1,64}$').

Real-world repro: an MCP server that exposes a CI/pipelines tool whose
'definitions' parameter is an array of pipeline-definition IDs. Such a tool
is enough on its own to break every conversation, because the full tools
array is sent on every request.

Fix: when descending into a 'properties' or 'patternProperties' mapping,
iterate property-name -> schema pairs directly, leaving the property names
verbatim. Ordinary JSON Schema semantics resume inside each property's
schema, so a legitimately nested 'definitions' meta-keyword inside a
property's schema is still promoted.

Adds two regression tests:
- test_definitions_as_property_name_is_preserved (the property-name case)
- test_definitions_property_and_meta_keyword_coexist (both forms in one
  schema; the property name stays, the meta-keyword promotes)
2026-07-01 01:02:23 -07:00
峯岸 亮
bc6cd46925 fix(agent): restrict todo hydration to paired assistant todo calls
The gateway/API server rebuilds the in-memory TodoStore by replaying
caller-supplied conversation_history. _hydrate_todo_store previously
accepted any role:tool message containing a "todos" array, so a forged
bare tool result could seed arbitrary todo state and re-inflate context
every turn (GHSA-5g4g-6jrg-mw3g).

Restrict hydration to tool results paired with an earlier assistant
todo tool call (matching tool_call_id, function name == todo, no
user/system boundary between). Reuse the existing _get_tool_call_id/
name_static helpers so dict- and object-shaped tool calls both work.
Add a generous MAX_TODO_RESULT_CHARS payload guard to drop absurd
forged results before parsing; item/content caps already exist on main.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-07-01 01:02:17 -07:00
binhnt92
bcfc7458fa fix remote sync-back credential overwrite 2026-07-01 01:00:31 -07:00
Justin Ohms
8f21311906 fix(delegation): route native-SDK providers through runtime resolver; fail on '(empty)' sentinel
Two related bugs caused subagent delegation to silently return empty summaries
with 0 tokens when the user configured delegation.provider=bedrock alongside
delegation.base_url=https://bedrock-runtime.<region>.amazonaws.com.

Root cause #1 — misrouting in _resolve_delegation_credentials():
  The configured_base_url branch unconditionally forced provider='custom' and
  api_mode='chat_completions', only specializing for chatgpt.com, anthropic,
  and kimi hosts. Bedrock (and other native-SDK providers) fell through as
  'custom' + chat_completions, which then POSTed OpenAI-shaped JSON at
  Bedrock's native API. Bedrock rejected the payload and returned nothing,
  which looked like an empty LLM response to the child agent.

  Fix: when provider is one of {bedrock, vertex, google, google-genai}, skip
  the base_url short-circuit and fall through to resolve_runtime_provider(),
  which knows how to construct the proper SDK client. base_url can still be
  forwarded through that path for regional overrides.

Root cause #2 — '(empty)' sentinel accepted as success:
  After N retries of empty LLM responses, run_agent.py emits the literal
  string '(empty)' as final_response. _run_single_child then hit
  `elif summary:` — '(empty)' is truthy, so status became 'completed' and
  the parent surfaced a blank result with no error. Users saw api_calls=4,
  tokens=0, duration~0.4s, status=completed.

  Fix: treat final_response.strip() == '(empty)' as a failure so the parent
  surfaces it instead of silently accepting zero-content 'success'.

Both paths were reproduced in a live Hermes TUI session on us-west-2 Bedrock
(provider=bedrock, model=us.anthropic.claude-sonnet-4-6) and are covered by
new tests in tests/tools/test_delegate.py.
2026-07-01 00:45:31 -07:00
LeonSGP43
55d92516c8 fix(skills): publish fetchable metadata for official skills 2026-07-01 00:40:56 -07:00
teknium1
56d4bfe4ba fix(approval): honour tirith_fail_open in cron-deny tirith path + tests
Follow-up to the salvaged #22070. The cron-deny tirith ImportError branch
was unconditionally fail-open; now it honours security.tirith_fail_open:
false by blocking (a cron session has no user to approve), mirroring the
main flow's fail-closed synthesis (#20733).

Adds regression tests: tirith-only content threat blocked in cron-deny,
plus fail-closed/fail-open ImportError behavior.
2026-07-01 00:13:36 -07:00
Rodrigo
c50f517bff fix(approval): run tirith check in cron-deny mode to catch content-level threats
In check_all_command_guards, the cron-deny path only ran
detect_dangerous_command (regex patterns). The tirith check starts at
line 1017, after the early return at line 1002, so content-level threats
caught only by tirith (homograph URLs, pipe-to-interpreter, terminal
injection) were silently approved in cron sessions even with
approvals.cron_mode: deny.

Add a tirith call inside the cron-deny block, mirroring the same
ImportError guard used in the main flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-07-01 00:13:36 -07:00
DanAsBjorn
a537baa81d fix(matrix): route text-only send_message through adapter for E2EE support
Text-only Matrix messages sent via the send_message engine (hermes send,
cron deliver: matrix) arrived unencrypted (red padlock) in E2EE rooms.
Media sends already routed through the mautrix adapter and encrypted fine,
but text-only sends took the raw-HTTP standalone_sender_fn path, which
never encrypts.

Route ALL Matrix sends through _send_matrix_via_adapter so text is
encrypted too. The adapter reuses the live gateway's E2EE session when
available (#46310) and falls back to an encryption-aware ephemeral adapter
for standalone/cron contexts. The registry standalone_sender_fn stays
registered for the contract; it is simply no longer reached for Matrix.

Salvaged from PR #20259 onto current main (the original patched the
pre-#41112 _send_matrix branch, which had since moved to the plugin's
standalone path).

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-07-01 00:12:11 -07:00
teknium1
0f66995e2a fix(approval): catch GNU long-flag abbreviations for chown --recursive and git push --force
GNU tools accept unique long-option prefix abbreviations at runtime, so
`chown --recurs root` and `git push --forc` evaded the approval gate's
exact-match `--recursive`/`--force` patterns. Switch those two entries
to prefix matches (--recur[a-z]*, --forc[a-z]*).

The rm/chmod/sed long-flag patterns were left unchanged: every abbreviation
of those is already caught by the sibling short-flag and target patterns
(rm -[^s]*r, base chmod 777, sed -[^s]*i), so prefix-matching them is a
no-op. Only chown (beyond the coincidental case-insensitive r->R catch) and
git push had genuine gaps.

Co-authored-by: Subway2023 <subw3@mail2.sysu.edu.cn>
2026-06-30 17:32:28 -07:00
Scott Gabel
4a7a6fd401 fix(approval): redact secrets in user-facing approval prompts
The dangerous-command approval prompt renders the flagged command so the
user can decide whether to approve. If the agent constructed it with a
credential (curl -H 'Authorization: Bearer sk-...', psql postgres://user:pw@host,
an execute_code script with api_key = 'sk-...'), that secret hit stdout and,
via the gateway notify payload, Discord/Slack messages — which are
screenshottable and forwardable.

Apply the existing agent.redact.redact_sensitive_text() to every user-facing
approval surface. Redaction is display-only: the raw command still executes
after approval, and approval persistence keys off pattern_key (not the command
text), so the allowlist is unaffected. Decision context (URL, flags, command
structure) is preserved; only the secret value masks.

Covers all surfaces, including the execute_code path the original PR missed:
- prompt_dangerous_approval(): callback + stdout fallback
- check_all_command_guards(): gateway approval_data + cron/batch pending fallback
- check_execute_code_guard(): gateway approval_data + no-notifier pending fallback
  (script body can embed credentials)

Adds TestApprovalPromptRedaction covering callback redaction, no-over-redaction
of clean commands, and the execute_code pending fallback.

Salvaged from PR #13139 by @sgabel; extended to the execute_code surface.
2026-06-30 17:29:11 -07:00
haileymarshall
9f22f36625 fix(mcp-oauth): anchor 401 handler task to prevent GC mid-flight
`handle_401` spawned a dedup'd recovery coroutine via
`asyncio.create_task(_do_handle())` and discarded the returned task
reference. Python's event loop only keeps weak references to tasks, so
the coroutine could be garbage-collected before it called
`pending.set_result(...)`. Every concurrent caller awaiting that future
then hangs forever, and the `finally: entry.pending_401.pop(...)`
cleanup never runs — so subsequent 401s for the same key latch onto the
dead future too. Same pattern the adapter-side fixes address (#11997,
#11998, #12000, #12001, #12006).

Hold the task in a process-wide set on the manager and discard it via
`add_done_callback` once it completes. Regression test covers both the
structural invariant (task tracked, then removed on completion) and a
concurrent dedup path with a forced `gc.collect()` between the handler's
await points.
2026-06-30 16:56:15 -07:00
WuKongAI-CMU
0ea3861b33 fix: keep persisted tool results inside their storage directory
Tool call ids are used to name persisted large-result files. Treating that id as a raw path segment allowed traversal-like ids to resolve outside hermes-results even though the shell command quoted metacharacters.

Convert ids to single filename stems, preserve normal ids, and add a short hash when normalization is needed so unsafe ids do not collide silently.

Constraint: Avoid new dependencies and preserve existing tool-result paths for normal tool call ids
Rejected: Quote only the path | shell quoting does not prevent ../ path traversal
Confidence: high
Scope-risk: narrow
Reversibility: clean
Tested: source /Users/peter/hermes-agent/venv/bin/activate && pytest tests/tools/test_tool_result_storage.py -q
Tested: source /Users/peter/hermes-agent/venv/bin/activate && python -m compileall tools/tool_result_storage.py tests/tools/test_tool_result_storage.py
Tested: git diff --check
2026-06-30 16:39:41 -07:00
etherman-os
2a3dbcaf46 fix(terminal): prevent corrupted session snapshots during init
The init snapshot dumped functions with a line-based filter:

    declare -f | grep -vE '^_[^_]'

That strips a function's *header* line (e.g. `_foo () `) but leaves the
orphaned `{ ... }` body behind, corrupting the snapshot that is sourced
before every command. Sourcing the torn snapshot runs leftover body code
and breaks subsequent commands (intermittent exit 127).

- Filter private (`_`-prefixed) functions by NAME via `declare -F` and
  dump only the wanted whole definitions, so a body is never torn. Guard
  against an empty name list (bare `declare -f` dumps everything).
- Treat a non-zero bootstrap exit code as snapshot-init failure, so
  execution safely falls back to login-shell-per-command mode.
- Add a regression test asserting snapshot_ready stays false when
  bootstrap exits non-zero.

Preserves the atomic-write ($BASHPID temp + mv -f) machinery from #38249.
2026-06-30 15:51:17 -07:00
kyssta-exe
20871c1d94 fix(skills): require review forks to read before writing skills 2026-06-30 15:49:36 -07:00
Erosika
a6175d1f93 style(profile): trim verbose comments to one or two lines 2026-06-30 15:30:06 -07:00
Erosika
bc396dafda test(profile): two-profile regression suite + preserve skills_hub monkeypatch seam
- tools/skills_hub.py: the per-call resolvers now honor a test-injected real
  module attribute (patch.object(hub, 'SKILLS_DIR', ...) / monkeypatch.setattr)
  before falling back to dynamic profile resolution. PEP 562 __getattr__ only
  fires when no real attribute exists, so an unpatched module resolves the
  active profile and a patched one respects the test's value — keeping the
  existing skills_hub test seam intact (5 tests had broken).
- tests/test_profile_isolation_runtime.py: real two-profile (no-mock) suite
  driving each previously-leaking site under override A then B and asserting
  the active profile's path/identity is used: skills_hub paths + derived
  constants + default-arg resolution, gateway cache getters (incl. the
  monkeypatch-still-wins seam), rich_sent_store path, and thread/executor
  context propagation (raw-thread hazard documented; primitive + _run_async
  worker proven to preserve the override).
2026-06-30 15:30:06 -07:00
Erosika
09af0a8c1d fix(profile): propagate profile context across thread/executor boundaries
A bare threading.Thread / ThreadPoolExecutor worker starts with an empty
contextvars.Context, so the context-local profile override
(_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process
multi-profile runtimes (desktop tui_gateway) the worker then resolves
get_hermes_home() to the launch/default profile, leaking one profile's
reads/writes into another. The fix primitive (tools.thread_context.
propagate_context_to_thread, which copies the parent context) already exists;
the leaking spawns simply did not use it.

- model_tools.py _run_async: wrap the worker-thread loop runner. This is the
  generic sync->async bridge for every async tool, so wrapping it here fixes
  the leak for all async tools at once (verified: an async tool reading
  get_hermes_home() under an override now resolves the active profile).
- run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land
  in the spawning turn's profile (#54937 path).
- tools/async_delegation.py: wrap both single + batch executor.submit calls so
  detached children resolve the dispatching profile's paths.

Scope: the vision CPU executor is intentionally left unwrapped — it runs pure
in-memory encode/resize and never resolves profile-scoped paths.
2026-06-30 15:30:06 -07:00
Erosika
10e60060d9 fix(profile): resolve import-time path globals per-call to honor profile override
In single-process multi-profile runtimes (desktop tui_gateway), profile
scoping is a context-local ContextVar override, not a process env var. Three
subsystems froze their HERMES_HOME-derived paths at import time (or read
os.environ directly), pinning every later profile to whichever profile first
imported the module — a cross-profile data leak.

- tools/skills_hub.py: SKILLS_DIR/HUB_DIR/LOCK_FILE/etc. were module constants
  frozen at import. Replace with per-call resolver functions; add a PEP 562
  module __getattr__ so external 'from tools.skills_hub import SKILLS_DIR'
  callers (all function-local) resolve dynamically with no call-site changes.
  Convert default-arg bindings (HubLockFile/TapsManager) and the derived
  HERMES_INDEX_CACHE_FILE constant too.
- gateway/platforms/base.py: image/audio/video/document cache-dir getters now
  re-resolve via get_hermes_dir() per call, falling back to the module
  constant when a test has monkeypatched it (preserves the existing test seam).
  Media-delivery safe-roots already enumerate all profiles' cache dirs
  (#31733), so per-profile resolution does not break delivery.
- gateway/rich_sent_store.py: _store_path() read os.environ['HERMES_HOME']
  directly, bypassing the override entirely; route through get_hermes_home().
2026-06-30 15:30:06 -07:00
srojk34
795913d3b0 fix(kanban): restrict goal_mode kanban_block to genuine external blockers
The judge gate added for kanban_complete (Issue #38367, PR #38388) only
covers one of the two exit paths out of run_kanban_goal_loop(). The loop
treats status == "blocked" as terminal identically to "done" (and any
other status outside running/ready/done/blocked also stops the loop —
see goals.py's status dispatch). A goal_mode worker that has learned
kanban_complete is gated can simply call kanban_block(reason="anything")
to escape the loop with zero judge involvement, fully defeating the
intent of #38367's fix.

This is Issue #38696, filed as the explicit follow-up by a reviewer on
PR #38388: "kanban_complete is one way out; kanban_block is another...
A worker that learns the complete path is gated can shift to calling
block to escape the loop with the same effect."

Implements the issue's "Option B" (deterministic allowlist, no extra
judge LLM call) using the kind taxonomy that already exists in
kb.VALID_BLOCK_KINDS, rather than inventing a new judge_goal() outcome
type (judge_goal only returns done/continue/wait/skipped — there's no
"is this block legitimate" verdict to hook the issue's "Option A"
pseudocode onto without expanding the judge's contract).

goal_mode tasks may only block with kind in {dependency, needs_input} —
the two kinds that represent a genuine external blocker the worker
cannot resolve itself. `capability`, `transient`, and an unset kind are
rejected with a message directing the worker to kanban_complete instead,
which the judge now gates. Non-goal_mode tasks are completely unaffected.
2026-06-30 14:29:42 -07:00
kshitijk4poor
a5e8cd4d40 fix(memory): degrade gracefully after repeated at-capacity consolidation failures (#42405)
Builds on the zero-match feedback fix (previous commit) to close the silent-hang
symptom: when memory is at capacity, a failed `add`/`replace`/`remove`
consolidation could loop the whole turn to iteration-budget exhaustion and
deliver no user-facing reply.

#41755 turned the at-capacity overflow error into a *commanded* in-turn retry
("...then retry this add — all in this turn"); combined with the fragile
substring-only `replace`/`remove` matching (LLMs can't reliably re-quote a long
entry verbatim), the model loops add↔replace on inexact guesses until the turn
dies. The existing tool_guardrails halt would catch this, but hard_stop_enabled
is opt-in (off by default), so a default install still hangs.

This fixes it at the memory layer without changing global guardrail behavior:
- MemoryStore tracks per-turn consolidation failures; after a cap (3) it drops
  the "retry in this turn" instruction and returns a terminal "leave memory
  unchanged, continue your reply" result, so a failed memory side effect can
  never block the turn's reply.
- The counter resets on any successful write (progress) and at each turn
  boundary (turn_context.reset_consolidation_failures, guarded via getattr so
  plugin memory stores without the method are a no-op).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-30 20:01:16 +05:30
kyssta-exe
62a1bf4c55 fix(tools): return previews on zero-match in replace/remove to prevent memory retry loops (#42405)
- replace() and remove() now return entry previews and current_entries
  when no entry matches old_text, matching the multi-match and add-limit
  error behavior
- add() limit error also now returns previews for consistency
- Agent can self-correct after a failed replace/remove instead of looping
  blindly until turn budget is exhausted with no user response
2026-06-30 20:01:16 +05:30
kshitijk4poor
824f2279da refactor(registry): drop dead toolset-check helpers after per-tool availability
Follow-up to the per-tool availability derivation: `_snapshot_toolset_checks`
and `_evaluate_toolset_check` had no remaining callers once the four
availability surfaces switched to `_toolset_has_exposable_tools`. Remove both,
drop the no-op `quiet` param from the new helper, and document why
`_toolset_checks` is still written (banner.py reads it via TOOLSET_REQUIREMENTS
to classify unavailable toolsets as lazy-init vs disabled).
2026-06-30 17:47:37 +05:30
xxxigm
6e84257717 fix(registry): derive toolset availability from per-tool checks
Doctor and banner used the first check_fn registered for a toolset, so
desktop-only read_terminal gated the whole terminal toolset even though
terminal and process still expose at runtime.

Fixes #54820
2026-06-30 17:47:37 +05:30
memosr
12f5624a76 fix(security): bind tool_override authorization to handler's defining plugin module
egilewski found the prior sink gate was transient: it only applied while
PluginManager executed register(ctx). A plugin could defer a direct
registry.register(..., override=True) to a post-load callback/thread, after
the scope was cleared, and still replace a built-in.

Make authorization durable by binding it to where the handler is DEFINED
(handler.__globals__['__name__']) rather than to call timing. At load, each
plugin's module namespace is mapped to its allow_tool_override opt-in in a
table that is never cleared. The sink resolves the handler's owning plugin
module and rejects an override from any plugin namespace without opt-in,
regardless of when or on which thread the call happens. Plugin namespaces
with no recorded policy are treated as not-opted-in (fail-closed). Built-in
and MCP handlers live outside the plugin namespace and are unaffected.

Adds a regression test for the delayed/post-load direct-registry override.
2026-06-30 04:00:42 -07:00
memosr
3101222312 fix(security): enforce tool_override opt-in at registry sink to close direct-import bypass
The opt-in gate lived only in PluginContext.register_tool, so a plugin
could bypass it by importing tools.registry and calling
registry.register(..., override=True) directly. Enforce the same gate at
the sink: during plugin load, the registry rejects an override from a
plugin without operator opt-in regardless of the path taken. Built-in and
MCP registrations (no active plugin scope) are unaffected.

Adds a regression test covering the direct-registry bypass.
2026-06-30 04:00:42 -07:00
Jeffgithub0029
b7c4369ca0 fix(telegram): chunk formatted messages with UTF-16 length accounting
The standalone send path (_send_telegram, used by the send_message tool,
cron delivery, and out-of-process callers) chunked the *raw* message on
UTF-16 length, then formatted and sent the result un-rechunked. MarkdownV2
escaping inflates the text (`!`/`.`/`-` -> `\!`/`\.`/`\-`), so a
4096 UTF-16-unit raw message can become ~8192 units once formatted and gets
rejected by Telegram as 'Message is too long'.

Move all text chunking into _send_telegram, after formatting: split the
formatted MarkdownV2/HTML text on UTF-16 length so every send is <=4096,
with per-chunk plain-text fallback and thread-not-found retry preserved.
Media attaches after all text chunks. (#28557)
2026-06-30 03:51:08 -07:00
nikshepsvn
d82a69b624 fix(tools): prune acp_command from delegate_task schema when no ACP CLI is on PATH
Defense-in-depth follow-up to the runtime guard added in the previous commit.
Models on headless hosts (Railway / Fly / Docker / fresh VPS) without any ACP
CLI installed occasionally hallucinate ``acp_command="copilot"`` from the
schema description, despite the explicit "Do NOT set" instruction. The runtime
guard prevented the crash but the model still wasted a tool turn and got an
opaque silent fallback.

This commit removes the temptation at its source: ``_build_dynamic_schema_overrides``
now strips ``acp_command`` and ``acp_args`` from both the top-level and per-task
schemas when none of the known ACP CLIs (``copilot``, ``claude``, ``codex``) are
detectable on PATH. The model literally never sees the fields, so it cannot
pass them.

The runtime guard from the previous commit stays in place as defense-in-depth
for internal callers, tests, and any future code path that bypasses the schema.

``_acp_binary_available`` is intentionally NOT cached: ``shutil.which`` is
cheap, and avoiding the cache means the schema reacts to mid-session installs
without requiring a process restart.

Tests:
- ``test_schema_prunes_acp_command_when_no_acp_binary``
- ``test_schema_keeps_acp_command_when_binary_available``
- ``test_acp_binary_available_checks_known_clis``

Full ``test_delegate.py`` suite: 136/136 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-30 03:41:46 -07:00
nikshepsvn
2e0b591076 fix(tools): validate acp_command binary exists before forcing copilot-acp transport
When a model passes `acp_command="copilot"` (or any other binary name) in a
`delegate_task` tool call, `_build_child_agent` unconditionally sets
`effective_provider = "copilot-acp"`, which routes the subagent through
`CopilotACPClient`. That client spawns the named binary via subprocess; if it
isn't on PATH, every retry raises RuntimeError and an asyncio cleanup race
during error delivery can take the entire gateway down.

This is a real failure mode on headless deploys (Railway / Fly / VPS / Docker)
where `copilot` / `claude` / etc. aren't installed. The schema does say
"Do NOT set unless the user explicitly told you an ACP CLI is installed,"
but models occasionally pass it anyway — particularly for X (Twitter) search
prompts where Grok seems to associate ACP with "search assistance."

Reproduction:
- Headless install (no `copilot` binary on PATH)
- Set provider to xai-oauth + model grok-4.3
- Telegram prompt: "Search X for crypto twitter trends"
- Grok decides to delegate and passes `acp_command="copilot"`
- Subagent crashes 3x, gateway crashes on the 3rd retry teardown

Fix: validate the binary exists on PATH via `shutil.which` before honoring
the override. If missing, log a warning and fall through to the parent's
default transport. No behavior change when the binary IS present (covered
by `test_build_child_agent_honors_acp_command_when_binary_present`).

Tests:
- `test_build_child_agent_ignores_acp_command_when_binary_missing`
- `test_build_child_agent_honors_acp_command_when_binary_present`

Verified on Python 3.11 (macOS) and 3.12 (Debian 13 container).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-30 03:41:46 -07:00
georgex8001
62b9fb6623 fix(acp): thread-safe interactive approval via contextvars
Concurrent ACP sessions run on a shared ThreadPoolExecutor (max_workers=4).
Each _run_agent mutated the process-global os.environ["HERMES_INTERACTIVE"]
and restored it in finally, so one session's restore could clobber another's
set mid-run — dropping the second session onto the non-interactive
auto-approve path, executing a dangerous command without the approval
callback firing (GHSA-96vc-wcxf-jjff).

Replace the env-var flag with a thread/task-local contextvar in
tools.approval. The two HERMES_INTERACTIVE read sites in approval.py now go
through _is_interactive_cli() (contextvar-first, env fallback for legacy
single-threaded CLI callers). The ACP executor sets the contextvar instead
of os.environ; the existing contextvars.copy_context() wrapper isolates each
session's write.

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-30 03:24:58 -07:00
Markus Phan
cd9f5cc671 fix(delegate): route subagent progress lines through _safe_print for ACP stdio
delegate_task's per-task completion display emitted lines like
"✓ [1/3] Research done (17.92s)" via a bare print(). Under ACP (and any
headless JSON-RPC stdio host where AIAgent routes human output to stderr
via a custom _print_fn), these landed on stdout and corrupted the
protocol frame stream, surfacing as "Failed to parse JSON message: ✓
[3/3] …" in the ACP adapter.

Add _emit_parent_console() which prefers parent_agent._safe_print (the
same hook AIAgent uses for every other user-facing print) and falls back
to print() only when no router is wired up or it raises. CLI behavior is
unchanged.

The PR's other fix (preset toolset expansion) is already covered on main
by _expand_parent_toolsets(), so only the stdio-safe printing change is
salvaged here.
2026-06-30 03:16:22 -07:00
teknium1
35a0803a3b fix(delegation): budget subagent summaries against parent context headroom
Batch delegation returned each subagent's full final_response verbatim
into the parent's context. A fan-out of N children could dump 60k+ tokens
at once, blowing the parent's context window and — on rate-limited
providers — triggering a compression/429 death spiral (429 misread as
context-too-large -> window step-down -> retry loop -> conversation dies).

Cap each summary against the parent's *remaining* context headroom split
across the batch (not a magic char count). When trimming, mirror the
web_extract convention: spill the full text to cache/delegation (mounted
into remote backends via credential_files._CACHE_DIRS) and return a
head+tail window (75/25, line-snapped) plus a footer with the exact
read_file offset to page the omitted middle. Both the subagent's opening
AND its closing (outcomes / files-changed / issues, which live at the end)
survive in-context, and nothing is lost — the parent can read_file the
full version on any backend.

delegation.max_summary_chars (default 24000) is a static ceiling layered
on top as belt-and-suspenders for models that ignore 'be concise'; 0
disables it. Child prompt tightened to lead with outcomes / bullets.

Co-authored-by: rc-int <rcint@klaith.com>
2026-06-30 03:07:40 -07:00