The init snapshot dumped functions with a line-based filter:
declare -f | grep -vE '^_[^_]'
That strips a function's *header* line (e.g. `_foo () `) but leaves the
orphaned `{ ... }` body behind, corrupting the snapshot that is sourced
before every command. Sourcing the torn snapshot runs leftover body code
and breaks subsequent commands (intermittent exit 127).
- Filter private (`_`-prefixed) functions by NAME via `declare -F` and
dump only the wanted whole definitions, so a body is never torn. Guard
against an empty name list (bare `declare -f` dumps everything).
- Treat a non-zero bootstrap exit code as snapshot-init failure, so
execution safely falls back to login-shell-per-command mode.
- Add a regression test asserting snapshot_ready stays false when
bootstrap exits non-zero.
Preserves the atomic-write ($BASHPID temp + mv -f) machinery from #38249.
- tools/skills_hub.py: the per-call resolvers now honor a test-injected real
module attribute (patch.object(hub, 'SKILLS_DIR', ...) / monkeypatch.setattr)
before falling back to dynamic profile resolution. PEP 562 __getattr__ only
fires when no real attribute exists, so an unpatched module resolves the
active profile and a patched one respects the test's value — keeping the
existing skills_hub test seam intact (5 tests had broken).
- tests/test_profile_isolation_runtime.py: real two-profile (no-mock) suite
driving each previously-leaking site under override A then B and asserting
the active profile's path/identity is used: skills_hub paths + derived
constants + default-arg resolution, gateway cache getters (incl. the
monkeypatch-still-wins seam), rich_sent_store path, and thread/executor
context propagation (raw-thread hazard documented; primitive + _run_async
worker proven to preserve the override).
A bare threading.Thread / ThreadPoolExecutor worker starts with an empty
contextvars.Context, so the context-local profile override
(_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process
multi-profile runtimes (desktop tui_gateway) the worker then resolves
get_hermes_home() to the launch/default profile, leaking one profile's
reads/writes into another. The fix primitive (tools.thread_context.
propagate_context_to_thread, which copies the parent context) already exists;
the leaking spawns simply did not use it.
- model_tools.py _run_async: wrap the worker-thread loop runner. This is the
generic sync->async bridge for every async tool, so wrapping it here fixes
the leak for all async tools at once (verified: an async tool reading
get_hermes_home() under an override now resolves the active profile).
- run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land
in the spawning turn's profile (#54937 path).
- tools/async_delegation.py: wrap both single + batch executor.submit calls so
detached children resolve the dispatching profile's paths.
Scope: the vision CPU executor is intentionally left unwrapped — it runs pure
in-memory encode/resize and never resolves profile-scoped paths.
In single-process multi-profile runtimes (desktop tui_gateway), profile
scoping is a context-local ContextVar override, not a process env var. Three
subsystems froze their HERMES_HOME-derived paths at import time (or read
os.environ directly), pinning every later profile to whichever profile first
imported the module — a cross-profile data leak.
- tools/skills_hub.py: SKILLS_DIR/HUB_DIR/LOCK_FILE/etc. were module constants
frozen at import. Replace with per-call resolver functions; add a PEP 562
module __getattr__ so external 'from tools.skills_hub import SKILLS_DIR'
callers (all function-local) resolve dynamically with no call-site changes.
Convert default-arg bindings (HubLockFile/TapsManager) and the derived
HERMES_INDEX_CACHE_FILE constant too.
- gateway/platforms/base.py: image/audio/video/document cache-dir getters now
re-resolve via get_hermes_dir() per call, falling back to the module
constant when a test has monkeypatched it (preserves the existing test seam).
Media-delivery safe-roots already enumerate all profiles' cache dirs
(#31733), so per-profile resolution does not break delivery.
- gateway/rich_sent_store.py: _store_path() read os.environ['HERMES_HOME']
directly, bypassing the override entirely; route through get_hermes_home().
The judge gate added for kanban_complete (Issue #38367, PR #38388) only
covers one of the two exit paths out of run_kanban_goal_loop(). The loop
treats status == "blocked" as terminal identically to "done" (and any
other status outside running/ready/done/blocked also stops the loop —
see goals.py's status dispatch). A goal_mode worker that has learned
kanban_complete is gated can simply call kanban_block(reason="anything")
to escape the loop with zero judge involvement, fully defeating the
intent of #38367's fix.
This is Issue #38696, filed as the explicit follow-up by a reviewer on
PR #38388: "kanban_complete is one way out; kanban_block is another...
A worker that learns the complete path is gated can shift to calling
block to escape the loop with the same effect."
Implements the issue's "Option B" (deterministic allowlist, no extra
judge LLM call) using the kind taxonomy that already exists in
kb.VALID_BLOCK_KINDS, rather than inventing a new judge_goal() outcome
type (judge_goal only returns done/continue/wait/skipped — there's no
"is this block legitimate" verdict to hook the issue's "Option A"
pseudocode onto without expanding the judge's contract).
goal_mode tasks may only block with kind in {dependency, needs_input} —
the two kinds that represent a genuine external blocker the worker
cannot resolve itself. `capability`, `transient`, and an unset kind are
rejected with a message directing the worker to kanban_complete instead,
which the judge now gates. Non-goal_mode tasks are completely unaffected.
Builds on the zero-match feedback fix (previous commit) to close the silent-hang
symptom: when memory is at capacity, a failed `add`/`replace`/`remove`
consolidation could loop the whole turn to iteration-budget exhaustion and
deliver no user-facing reply.
#41755 turned the at-capacity overflow error into a *commanded* in-turn retry
("...then retry this add — all in this turn"); combined with the fragile
substring-only `replace`/`remove` matching (LLMs can't reliably re-quote a long
entry verbatim), the model loops add↔replace on inexact guesses until the turn
dies. The existing tool_guardrails halt would catch this, but hard_stop_enabled
is opt-in (off by default), so a default install still hangs.
This fixes it at the memory layer without changing global guardrail behavior:
- MemoryStore tracks per-turn consolidation failures; after a cap (3) it drops
the "retry in this turn" instruction and returns a terminal "leave memory
unchanged, continue your reply" result, so a failed memory side effect can
never block the turn's reply.
- The counter resets on any successful write (progress) and at each turn
boundary (turn_context.reset_consolidation_failures, guarded via getattr so
plugin memory stores without the method are a no-op).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
- replace() and remove() now return entry previews and current_entries
when no entry matches old_text, matching the multi-match and add-limit
error behavior
- add() limit error also now returns previews for consistency
- Agent can self-correct after a failed replace/remove instead of looping
blindly until turn budget is exhausted with no user response
Follow-up to the per-tool availability derivation: `_snapshot_toolset_checks`
and `_evaluate_toolset_check` had no remaining callers once the four
availability surfaces switched to `_toolset_has_exposable_tools`. Remove both,
drop the no-op `quiet` param from the new helper, and document why
`_toolset_checks` is still written (banner.py reads it via TOOLSET_REQUIREMENTS
to classify unavailable toolsets as lazy-init vs disabled).
Doctor and banner used the first check_fn registered for a toolset, so
desktop-only read_terminal gated the whole terminal toolset even though
terminal and process still expose at runtime.
Fixes#54820
egilewski found the prior sink gate was transient: it only applied while
PluginManager executed register(ctx). A plugin could defer a direct
registry.register(..., override=True) to a post-load callback/thread, after
the scope was cleared, and still replace a built-in.
Make authorization durable by binding it to where the handler is DEFINED
(handler.__globals__['__name__']) rather than to call timing. At load, each
plugin's module namespace is mapped to its allow_tool_override opt-in in a
table that is never cleared. The sink resolves the handler's owning plugin
module and rejects an override from any plugin namespace without opt-in,
regardless of when or on which thread the call happens. Plugin namespaces
with no recorded policy are treated as not-opted-in (fail-closed). Built-in
and MCP handlers live outside the plugin namespace and are unaffected.
Adds a regression test for the delayed/post-load direct-registry override.
The opt-in gate lived only in PluginContext.register_tool, so a plugin
could bypass it by importing tools.registry and calling
registry.register(..., override=True) directly. Enforce the same gate at
the sink: during plugin load, the registry rejects an override from a
plugin without operator opt-in regardless of the path taken. Built-in and
MCP registrations (no active plugin scope) are unaffected.
Adds a regression test covering the direct-registry bypass.
The standalone send path (_send_telegram, used by the send_message tool,
cron delivery, and out-of-process callers) chunked the *raw* message on
UTF-16 length, then formatted and sent the result un-rechunked. MarkdownV2
escaping inflates the text (`!`/`.`/`-` -> `\!`/`\.`/`\-`), so a
4096 UTF-16-unit raw message can become ~8192 units once formatted and gets
rejected by Telegram as 'Message is too long'.
Move all text chunking into _send_telegram, after formatting: split the
formatted MarkdownV2/HTML text on UTF-16 length so every send is <=4096,
with per-chunk plain-text fallback and thread-not-found retry preserved.
Media attaches after all text chunks. (#28557)
Defense-in-depth follow-up to the runtime guard added in the previous commit.
Models on headless hosts (Railway / Fly / Docker / fresh VPS) without any ACP
CLI installed occasionally hallucinate ``acp_command="copilot"`` from the
schema description, despite the explicit "Do NOT set" instruction. The runtime
guard prevented the crash but the model still wasted a tool turn and got an
opaque silent fallback.
This commit removes the temptation at its source: ``_build_dynamic_schema_overrides``
now strips ``acp_command`` and ``acp_args`` from both the top-level and per-task
schemas when none of the known ACP CLIs (``copilot``, ``claude``, ``codex``) are
detectable on PATH. The model literally never sees the fields, so it cannot
pass them.
The runtime guard from the previous commit stays in place as defense-in-depth
for internal callers, tests, and any future code path that bypasses the schema.
``_acp_binary_available`` is intentionally NOT cached: ``shutil.which`` is
cheap, and avoiding the cache means the schema reacts to mid-session installs
without requiring a process restart.
Tests:
- ``test_schema_prunes_acp_command_when_no_acp_binary``
- ``test_schema_keeps_acp_command_when_binary_available``
- ``test_acp_binary_available_checks_known_clis``
Full ``test_delegate.py`` suite: 136/136 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a model passes `acp_command="copilot"` (or any other binary name) in a
`delegate_task` tool call, `_build_child_agent` unconditionally sets
`effective_provider = "copilot-acp"`, which routes the subagent through
`CopilotACPClient`. That client spawns the named binary via subprocess; if it
isn't on PATH, every retry raises RuntimeError and an asyncio cleanup race
during error delivery can take the entire gateway down.
This is a real failure mode on headless deploys (Railway / Fly / VPS / Docker)
where `copilot` / `claude` / etc. aren't installed. The schema does say
"Do NOT set unless the user explicitly told you an ACP CLI is installed,"
but models occasionally pass it anyway — particularly for X (Twitter) search
prompts where Grok seems to associate ACP with "search assistance."
Reproduction:
- Headless install (no `copilot` binary on PATH)
- Set provider to xai-oauth + model grok-4.3
- Telegram prompt: "Search X for crypto twitter trends"
- Grok decides to delegate and passes `acp_command="copilot"`
- Subagent crashes 3x, gateway crashes on the 3rd retry teardown
Fix: validate the binary exists on PATH via `shutil.which` before honoring
the override. If missing, log a warning and fall through to the parent's
default transport. No behavior change when the binary IS present (covered
by `test_build_child_agent_honors_acp_command_when_binary_present`).
Tests:
- `test_build_child_agent_ignores_acp_command_when_binary_missing`
- `test_build_child_agent_honors_acp_command_when_binary_present`
Verified on Python 3.11 (macOS) and 3.12 (Debian 13 container).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concurrent ACP sessions run on a shared ThreadPoolExecutor (max_workers=4).
Each _run_agent mutated the process-global os.environ["HERMES_INTERACTIVE"]
and restored it in finally, so one session's restore could clobber another's
set mid-run — dropping the second session onto the non-interactive
auto-approve path, executing a dangerous command without the approval
callback firing (GHSA-96vc-wcxf-jjff).
Replace the env-var flag with a thread/task-local contextvar in
tools.approval. The two HERMES_INTERACTIVE read sites in approval.py now go
through _is_interactive_cli() (contextvar-first, env fallback for legacy
single-threaded CLI callers). The ACP executor sets the contextvar instead
of os.environ; the existing contextvars.copy_context() wrapper isolates each
session's write.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
delegate_task's per-task completion display emitted lines like
"✓ [1/3] Research done (17.92s)" via a bare print(). Under ACP (and any
headless JSON-RPC stdio host where AIAgent routes human output to stderr
via a custom _print_fn), these landed on stdout and corrupted the
protocol frame stream, surfacing as "Failed to parse JSON message: ✓
[3/3] …" in the ACP adapter.
Add _emit_parent_console() which prefers parent_agent._safe_print (the
same hook AIAgent uses for every other user-facing print) and falls back
to print() only when no router is wired up or it raises. CLI behavior is
unchanged.
The PR's other fix (preset toolset expansion) is already covered on main
by _expand_parent_toolsets(), so only the stdio-safe printing change is
salvaged here.
Batch delegation returned each subagent's full final_response verbatim
into the parent's context. A fan-out of N children could dump 60k+ tokens
at once, blowing the parent's context window and — on rate-limited
providers — triggering a compression/429 death spiral (429 misread as
context-too-large -> window step-down -> retry loop -> conversation dies).
Cap each summary against the parent's *remaining* context headroom split
across the batch (not a magic char count). When trimming, mirror the
web_extract convention: spill the full text to cache/delegation (mounted
into remote backends via credential_files._CACHE_DIRS) and return a
head+tail window (75/25, line-snapped) plus a footer with the exact
read_file offset to page the omitted middle. Both the subagent's opening
AND its closing (outcomes / files-changed / issues, which live at the end)
survive in-context, and nothing is lost — the parent can read_file the
full version on any backend.
delegation.max_summary_chars (default 24000) is a static ceiling layered
on top as belt-and-suspenders for models that ignore 'be concise'; 0
disables it. Child prompt tightened to lead with outcomes / bullets.
Co-authored-by: rc-int <rcint@klaith.com>
Three independent security-scanner hardenings, re-homed onto the current
shared threat-pattern architecture (tools/threat_patterns.py):
- approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The
existing heredoc pattern only covered python/perl/ruby/node, so
`bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines
whose inner commands don't individually match a pattern — with no prompt.
- threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before
pattern matching so full-width / compatibility homographs (e.g.
`cat ~/.hermes/.env`) are folded to ASCII and no longer bypass the
keyword scanners. Invisible-char detection still runs on the raw content
first (NFKC can strip those codepoints).
- code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so
vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the
sandbox env. PASS was intentionally dropped from the original proposal —
it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while
PASSWORD/PASSWD already cover the credential cases.
The original PR also proposed a 'synonym' injection pattern block
(overlook/forget/set aside/bypass/discard + developer-mode); dropped here
because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't
forget to follow the rules", "run in developer mode"), exactly the
bossy-English class threat_patterns.py is documented to avoid.
Salvaged from #9028.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
patch_tool extracts V4A patch paths so _check_sensitive_path can refuse
writes to /etc/*, /boot/*, etc. before they reach the low-level file ops.
The extraction regex had two gaps:
1. `*** Move File: src -> dst` was never extracted (regex only matched
Update/Add/Delete), so a Move targeting /etc/crontab skipped the
pre-check and fell back on the narrower file_operations deny list.
2. The regex required `\\s+` after `***` but patch_parser uses `\\s*`, so
`***Update File: /etc/hosts` (no space) parsed + applied while
skipping the check.
Loosen the leading whitespace to \\s* and add a Move regex that checks
both endpoints. Move endpoints also run through the same '..' traversal
rejection as the other V4A headers (closes the sibling gap on current
main, which gained that traversal guard after this PR was opened).
The gateway-lifecycle guard's hermes-CLI pattern required `hermes`
and `gateway` to be adjacent, so a profile flag slipped the agent
past it: `hermes -p ade gateway restart` was not flagged. That is the
exact form from the 2026-04-11 ade-profile self-kill loop. Allow an
optional run of global flags (`-p ade`, `--profile ade`, multiple
flags) between `hermes` and the gateway subcommand.
launchctl self-termination is already covered on main by #33071; this
narrows the only remaining real gap.
Ensure Windows desktop and local terminal teardown kill full process trees so Git Bash descendants cannot survive wrapper exits and accumulate across retries.
Two robustness gaps from the #54843 truncate-store path:
- _store_full_text wrote the full clean page to cache/web with no upper
bound (path.write_text(content)); a multi-MB page → unbounded per-extract
disk write. Cap at MAX_STORED_TEXT_CHARS (2MB, the pre-truncate-store
refusal ceiling) with a marker when capped.
- The truncation footer told the model 'read_file ... offset=<line>' — a
literal placeholder it had to guess. Compute the real starting line of the
omitted middle (head line count + 1) so the first read_file lands in the gap.
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.
Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.
Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.
Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
Apply naqerl's review comments on PR #38388:
- Hoist `from hermes_cli.goals import judge_goal` to module-level
imports so an import failure surfaces at module init, not lazily
on the first goal-mode completion (no circular import: hermes_cli
package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
The verdict check and its rejection `return tool_error(...)` now
live outside the handler, so a failure there can no longer be
swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.
Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.
This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.
Fixes#38367
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add durable public-URL output and URL-based chaining to xAI Grok Imagine:
- Store generated media on files-cdn with permanent public HTTPS URLs
(public_url: true, no expiry by default).
- Chain by URL: generate -> edit -> extend each take a prior result's
public HTTPS URL (or a data URI / local file for inputs).
- Add provider-specific xai_video_edit and xai_video_extend tools.
- Image generation: public-URL/storage output, multi-reference edits,
and ~/ local-path support for image edits.
Credentials use xAI Grok device-code OAuth (separate PR).
* feat(web_extract): truncate-and-store instead of LLM summarization
web_extract no longer runs an auxiliary LLM over scraped pages. The extract
backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate-
stripped markdown, so we return it directly: pages within a char budget
(default 15000, web.extract_char_limit) come back whole; larger pages get a
head+tail window plus an explicit footer giving the stored full-text path and
the read_file call to page through the omitted middle. The full clean text is
written to cache/web (mounted read-only into remote backends like the other
cache dirs), so nothing is lost.
Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs
dropped) while real http(s) image URLs are preserved as links so the agent can
still web_extract/vision_analyze them.
Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model
+ _resolve_web_extract_auxiliary. context_references._default_url_fetcher is
updated to the truncate path and its stale data.documents shape read is fixed
to results (it was silently returning empty).
Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s ->
15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer
recoverable from stored full text on every truncated page). web_search is
unchanged.
No own scraper added; no changes to web_search.
* fix(web_extract): add char_limit to execute_code web_extract stub
The new web_extract char_limit param must appear in the code_execution_tool
_TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params
fails — the stub schema must cover every real schema param.
Wire the salvaged _safe_command_timeout() guard into the surviving
open-timeout call site. _get_open_command_timeout() feeds the
browser_navigate 'open' path; this closes the last call site that
could observe a None timeout from a torn cache (#14331), since the
original PR's max(_get_command_timeout(), 60) site no longer exists
on main (now routed through _get_open_command_timeout).
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.
The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:
- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
encode/resize/dimension-check off the caller's loop, sized to the host's
usable core count with NO fixed ceiling — the cap tracks the actual
exhausted resource (cores), not a magic number. Excess encodes queue on
the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
(honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.
Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.
Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).
It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.
Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.
Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
When a Camofox browser tab is garbage collected (idle timeout, browser
recycle), the held tab_id becomes stale. The next browser_navigate call
hits /tabs/{stale_id}/navigate -> HTTP 404 -> unhandled HTTPError.
Catch the 404 in camofox_navigate, clear the stale tab_id, and create a
fresh tab via _ensure_tab. The agent recovers transparently without
requiring a session restart.
Other tab operations (snapshot, click, type, etc.) use the same pattern
but only fail if the tab dies between successful calls — much rarer.
The navigate fix covers 95%+ of cases since navigate is always the entry
point.
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().
This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.
Fixes#40843
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.
Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.
Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.
Fixes#20476
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`. This caused a 400
Bad Request on every `browser_navigate` call.
The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.
Fixes#37960
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.
Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
(tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
(_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
chicken-and-egg trap (provider never loaded on a sealed venv, so
ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
lazy-covered-extras contract test (excluded from [all] by policy).
Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.
Make the read-only agent terminal mirrors stream in real time and give
the agent a desktop-only way to dismiss its own tabs.
- Stream background output live: the local reader used a blocking
read(4096) that buffered small periodic output until EOF, so agent
tabs only "filled in" at process exit. Switch to buffer.read1(4096)
(decoded) for incremental chunks.
- Route agent.terminal.output / terminal.close to the window that owns
the process (its gateway session) instead of an empty session id, so
events actually reach the desktop renderer.
- Add close_terminal: a HERMES_DESKTOP-gated tool (sibling of
read_terminal) that drops a process's read-only tab WITHOUT killing it
via process_registry.on_close; output keeps buffering and the user can
reopen from the status stack.
- ⌘W now closes a focused agent tab: mark the agent instance
data-terminal and focus it on activation so isFocusWithin routes there.
- ensureTerminal() no longer spawns an extra user shell when a tab
already exists (e.g. opening a background task from the status stack).
_normalize_approval_mode() previously accepted any string, so an unknown
value like 'auto' fell through every downstream mode check (off/smart) and
silently behaved like manual with no signal. Validate against the known
modes (manual/smart/off), emit a warning for anything else, and default to
manual to match the config default and the rest of the function.
Bug 1 from the original PR (/approve & /deny bypassing the running-agent
guard) already landed on main independently, so only the mode-validation
fix is salvaged here.
Fixes#4261
Co-authored-by: Hermes Agent <agent@nousresearch.com>
All three .env parsers use `line.partition("=")` without stripping the
bash-compatible `export ` prefix first. A line like `export API_KEY=sk-...`
produces key `"export API_KEY"` instead of `"API_KEY"`, silently ignoring
the variable and causing auth failures for users who copy-paste from
bash profiles or follow tutorials that include `export`.
- tools/skills_tool.py: `load_env()` for skill environment
- hermes_cli/config.py: `load_env()` for core config
- hermes_cli/main.py: `_has_any_provider_configured()` inline parser
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(terminal): require approval for host-bound Docker commands
The Docker terminal backend blanket-skips dangerous-command approval on
the assumption that the container is isolated from the host. That holds
only when nothing is bind-mounted in. Once a host path is exposed (via
TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE or a host-path entry in
TERMINAL_DOCKER_VOLUMES), a command like `rm -rf /workspace` reaches
real host files but is still auto-approved.
Detect host bind mounts and route those sessions through the normal
approval flow. Isolated Docker keeps the fast path. The same gating is
applied to the execute_code guard, which had the identical blanket skip.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
* chore: add AUTHOR_MAP entry for PR #6436 salvage (Kolektori)
* test: accept has_host_access kwarg in _check_all_guards mocks
The host-bound Docker approval fix adds a has_host_access kwarg to the
_check_all_guards wrapper. Six pre-existing tests monkeypatch it with a
fixed (command, env_type) / (cmd, env) lambda signature, which now
raises TypeError when terminal_tool passes the new kwarg. Widen those
mock signatures to accept **kwargs.
---------
Co-authored-by: Kolektori <256073454+Kolektori@users.noreply.github.com>
Co-authored-by: Hermes Agent <agent@nousresearch.com>
On hosts where the cgroup v2 cpu/memory/pids controllers are not delegated
to the docker/podman process (unprivileged Proxmox LXCs, some rootless and
nested setups), --pids-limit/--cpus/--memory cause every container start to
fail with OCI runtime error / exit 126, breaking terminal + execute_code.
- Add _cgroup_limits_available(image): one-shot, host-wide cached probe that
spawns a throwaway container from the sandbox image itself (sleep 0) with
all three flags together, mirroring the existing _storage_opt_supported
probe-and-degrade pattern.
- Remove --pids-limit from static _BASE_SECURITY_ARGS; apply it (default 256
via _DEFAULT_PIDS_LIMIT) in resource_args gated on the probe.
- Gate --cpus and --memory on the same probe.
Behavior unchanged on cgroup-capable hosts; graceful degradation with a
one-time warning where controllers aren't delegated.
Fixes#6568.
(cherry picked from commit c933880b7ee2ce4d1167e0f89caa2d233db5639f)
Co-authored-by: angelos <angelos@oikos.lan.home.malaiwah.com>
Replace the 5s output_tail poll (which often showed nothing) with a real push
stream. The process registry gains an on_output sink called from its reader
threads with each chunk; the tui_gateway wires it to emit agent.terminal.output
{process_id, chunk} (write_json is _stdout_lock-guarded, so emitting from the
reader thread is safe). The desktop routes chunks by process id straight into
the read-only agent xterm via a small writer registry, with a capped backlog so
a tab opened mid-stream (or reopened) replays what it missed.
Drops the fragile poll/tail path: no session-key matching, no truncation, no
lag — full-fidelity ANSI, env-agnostic (local/docker/ssh).
windows_hide_flags() already returns 0 on POSIX (and creationflags=0 is
the no-op default there, exactly how server.py::_list_repo_files does it),
so drop the IS_WINDOWS import + ternary/one-use-dict gating and just pass
creationflags=windows_hide_flags() directly. Tests lose the now-pointless
IS_WINDOWS monkeypatch.
The #54236/#54417 backend git/gh sweep routed git_probe, the repo-file
picker, coding_context, context_references, copilot_auth, and the gateway
process scans through CREATE_NO_WINDOW, but two sibling spawn legs that
also run inside the console-less desktop/gateway backend were missed:
- tools/checkpoint_manager.py `_run_git` (and the one-shot `git init
--bare` in `_init_store`) — when checkpoints are enabled, every
file-mutating turn fires multiple bare `git` calls (status, add,
write-tree/commit-tree, update-ref). Spawned from a parent with no
console (Electron spawns the backend with windowsHide → CREATE_NO_WINDOW),
each one allocates its own conhost window → a flurry of terminal popups.
- tools/skills_hub.py `GitHubAuth._try_gh_cli` — `gh auth token`, the same
bug class as the already-fixed copilot_auth gh probe.
Route both through `windows_hide_flags()` (no-op on POSIX), matching the
established per-site pattern. Tests added to
tests/test_windows_subprocess_no_window_flags.py.