Follow-up to the salvaged #22070. The cron-deny tirith ImportError branch
was unconditionally fail-open; now it honours security.tirith_fail_open:
false by blocking (a cron session has no user to approve), mirroring the
main flow's fail-closed synthesis (#20733).
Adds regression tests: tirith-only content threat blocked in cron-deny,
plus fail-closed/fail-open ImportError behavior.
In check_all_command_guards, the cron-deny path only ran
detect_dangerous_command (regex patterns). The tirith check starts at
line 1017, after the early return at line 1002, so content-level threats
caught only by tirith (homograph URLs, pipe-to-interpreter, terminal
injection) were silently approved in cron sessions even with
approvals.cron_mode: deny.
Add a tirith call inside the cron-deny block, mirroring the same
ImportError guard used in the main flow.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Text-only Matrix messages sent via the send_message engine (hermes send,
cron deliver: matrix) arrived unencrypted (red padlock) in E2EE rooms.
Media sends already routed through the mautrix adapter and encrypted fine,
but text-only sends took the raw-HTTP standalone_sender_fn path, which
never encrypts.
Route ALL Matrix sends through _send_matrix_via_adapter so text is
encrypted too. The adapter reuses the live gateway's E2EE session when
available (#46310) and falls back to an encryption-aware ephemeral adapter
for standalone/cron contexts. The registry standalone_sender_fn stays
registered for the contract; it is simply no longer reached for Matrix.
Salvaged from PR #20259 onto current main (the original patched the
pre-#41112 _send_matrix branch, which had since moved to the plugin's
standalone path).
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
GNU tools accept unique long-option prefix abbreviations at runtime, so
`chown --recurs root` and `git push --forc` evaded the approval gate's
exact-match `--recursive`/`--force` patterns. Switch those two entries
to prefix matches (--recur[a-z]*, --forc[a-z]*).
The rm/chmod/sed long-flag patterns were left unchanged: every abbreviation
of those is already caught by the sibling short-flag and target patterns
(rm -[^s]*r, base chmod 777, sed -[^s]*i), so prefix-matching them is a
no-op. Only chown (beyond the coincidental case-insensitive r->R catch) and
git push had genuine gaps.
Co-authored-by: Subway2023 <subw3@mail2.sysu.edu.cn>
The dangerous-command approval prompt renders the flagged command so the
user can decide whether to approve. If the agent constructed it with a
credential (curl -H 'Authorization: Bearer sk-...', psql postgres://user:pw@host,
an execute_code script with api_key = 'sk-...'), that secret hit stdout and,
via the gateway notify payload, Discord/Slack messages — which are
screenshottable and forwardable.
Apply the existing agent.redact.redact_sensitive_text() to every user-facing
approval surface. Redaction is display-only: the raw command still executes
after approval, and approval persistence keys off pattern_key (not the command
text), so the allowlist is unaffected. Decision context (URL, flags, command
structure) is preserved; only the secret value masks.
Covers all surfaces, including the execute_code path the original PR missed:
- prompt_dangerous_approval(): callback + stdout fallback
- check_all_command_guards(): gateway approval_data + cron/batch pending fallback
- check_execute_code_guard(): gateway approval_data + no-notifier pending fallback
(script body can embed credentials)
Adds TestApprovalPromptRedaction covering callback redaction, no-over-redaction
of clean commands, and the execute_code pending fallback.
Salvaged from PR #13139 by @sgabel; extended to the execute_code surface.
`handle_401` spawned a dedup'd recovery coroutine via
`asyncio.create_task(_do_handle())` and discarded the returned task
reference. Python's event loop only keeps weak references to tasks, so
the coroutine could be garbage-collected before it called
`pending.set_result(...)`. Every concurrent caller awaiting that future
then hangs forever, and the `finally: entry.pending_401.pop(...)`
cleanup never runs — so subsequent 401s for the same key latch onto the
dead future too. Same pattern the adapter-side fixes address (#11997,
#11998, #12000, #12001, #12006).
Hold the task in a process-wide set on the manager and discard it via
`add_done_callback` once it completes. Regression test covers both the
structural invariant (task tracked, then removed on completion) and a
concurrent dedup path with a forced `gc.collect()` between the handler's
await points.
Tool call ids are used to name persisted large-result files. Treating that id as a raw path segment allowed traversal-like ids to resolve outside hermes-results even though the shell command quoted metacharacters.
Convert ids to single filename stems, preserve normal ids, and add a short hash when normalization is needed so unsafe ids do not collide silently.
Constraint: Avoid new dependencies and preserve existing tool-result paths for normal tool call ids
Rejected: Quote only the path | shell quoting does not prevent ../ path traversal
Confidence: high
Scope-risk: narrow
Reversibility: clean
Tested: source /Users/peter/hermes-agent/venv/bin/activate && pytest tests/tools/test_tool_result_storage.py -q
Tested: source /Users/peter/hermes-agent/venv/bin/activate && python -m compileall tools/tool_result_storage.py tests/tools/test_tool_result_storage.py
Tested: git diff --check
The init snapshot dumped functions with a line-based filter:
declare -f | grep -vE '^_[^_]'
That strips a function's *header* line (e.g. `_foo () `) but leaves the
orphaned `{ ... }` body behind, corrupting the snapshot that is sourced
before every command. Sourcing the torn snapshot runs leftover body code
and breaks subsequent commands (intermittent exit 127).
- Filter private (`_`-prefixed) functions by NAME via `declare -F` and
dump only the wanted whole definitions, so a body is never torn. Guard
against an empty name list (bare `declare -f` dumps everything).
- Treat a non-zero bootstrap exit code as snapshot-init failure, so
execution safely falls back to login-shell-per-command mode.
- Add a regression test asserting snapshot_ready stays false when
bootstrap exits non-zero.
Preserves the atomic-write ($BASHPID temp + mv -f) machinery from #38249.
- tools/skills_hub.py: the per-call resolvers now honor a test-injected real
module attribute (patch.object(hub, 'SKILLS_DIR', ...) / monkeypatch.setattr)
before falling back to dynamic profile resolution. PEP 562 __getattr__ only
fires when no real attribute exists, so an unpatched module resolves the
active profile and a patched one respects the test's value — keeping the
existing skills_hub test seam intact (5 tests had broken).
- tests/test_profile_isolation_runtime.py: real two-profile (no-mock) suite
driving each previously-leaking site under override A then B and asserting
the active profile's path/identity is used: skills_hub paths + derived
constants + default-arg resolution, gateway cache getters (incl. the
monkeypatch-still-wins seam), rich_sent_store path, and thread/executor
context propagation (raw-thread hazard documented; primitive + _run_async
worker proven to preserve the override).
A bare threading.Thread / ThreadPoolExecutor worker starts with an empty
contextvars.Context, so the context-local profile override
(_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process
multi-profile runtimes (desktop tui_gateway) the worker then resolves
get_hermes_home() to the launch/default profile, leaking one profile's
reads/writes into another. The fix primitive (tools.thread_context.
propagate_context_to_thread, which copies the parent context) already exists;
the leaking spawns simply did not use it.
- model_tools.py _run_async: wrap the worker-thread loop runner. This is the
generic sync->async bridge for every async tool, so wrapping it here fixes
the leak for all async tools at once (verified: an async tool reading
get_hermes_home() under an override now resolves the active profile).
- run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land
in the spawning turn's profile (#54937 path).
- tools/async_delegation.py: wrap both single + batch executor.submit calls so
detached children resolve the dispatching profile's paths.
Scope: the vision CPU executor is intentionally left unwrapped — it runs pure
in-memory encode/resize and never resolves profile-scoped paths.
In single-process multi-profile runtimes (desktop tui_gateway), profile
scoping is a context-local ContextVar override, not a process env var. Three
subsystems froze their HERMES_HOME-derived paths at import time (or read
os.environ directly), pinning every later profile to whichever profile first
imported the module — a cross-profile data leak.
- tools/skills_hub.py: SKILLS_DIR/HUB_DIR/LOCK_FILE/etc. were module constants
frozen at import. Replace with per-call resolver functions; add a PEP 562
module __getattr__ so external 'from tools.skills_hub import SKILLS_DIR'
callers (all function-local) resolve dynamically with no call-site changes.
Convert default-arg bindings (HubLockFile/TapsManager) and the derived
HERMES_INDEX_CACHE_FILE constant too.
- gateway/platforms/base.py: image/audio/video/document cache-dir getters now
re-resolve via get_hermes_dir() per call, falling back to the module
constant when a test has monkeypatched it (preserves the existing test seam).
Media-delivery safe-roots already enumerate all profiles' cache dirs
(#31733), so per-profile resolution does not break delivery.
- gateway/rich_sent_store.py: _store_path() read os.environ['HERMES_HOME']
directly, bypassing the override entirely; route through get_hermes_home().
The judge gate added for kanban_complete (Issue #38367, PR #38388) only
covers one of the two exit paths out of run_kanban_goal_loop(). The loop
treats status == "blocked" as terminal identically to "done" (and any
other status outside running/ready/done/blocked also stops the loop —
see goals.py's status dispatch). A goal_mode worker that has learned
kanban_complete is gated can simply call kanban_block(reason="anything")
to escape the loop with zero judge involvement, fully defeating the
intent of #38367's fix.
This is Issue #38696, filed as the explicit follow-up by a reviewer on
PR #38388: "kanban_complete is one way out; kanban_block is another...
A worker that learns the complete path is gated can shift to calling
block to escape the loop with the same effect."
Implements the issue's "Option B" (deterministic allowlist, no extra
judge LLM call) using the kind taxonomy that already exists in
kb.VALID_BLOCK_KINDS, rather than inventing a new judge_goal() outcome
type (judge_goal only returns done/continue/wait/skipped — there's no
"is this block legitimate" verdict to hook the issue's "Option A"
pseudocode onto without expanding the judge's contract).
goal_mode tasks may only block with kind in {dependency, needs_input} —
the two kinds that represent a genuine external blocker the worker
cannot resolve itself. `capability`, `transient`, and an unset kind are
rejected with a message directing the worker to kanban_complete instead,
which the judge now gates. Non-goal_mode tasks are completely unaffected.
Builds on the zero-match feedback fix (previous commit) to close the silent-hang
symptom: when memory is at capacity, a failed `add`/`replace`/`remove`
consolidation could loop the whole turn to iteration-budget exhaustion and
deliver no user-facing reply.
#41755 turned the at-capacity overflow error into a *commanded* in-turn retry
("...then retry this add — all in this turn"); combined with the fragile
substring-only `replace`/`remove` matching (LLMs can't reliably re-quote a long
entry verbatim), the model loops add↔replace on inexact guesses until the turn
dies. The existing tool_guardrails halt would catch this, but hard_stop_enabled
is opt-in (off by default), so a default install still hangs.
This fixes it at the memory layer without changing global guardrail behavior:
- MemoryStore tracks per-turn consolidation failures; after a cap (3) it drops
the "retry in this turn" instruction and returns a terminal "leave memory
unchanged, continue your reply" result, so a failed memory side effect can
never block the turn's reply.
- The counter resets on any successful write (progress) and at each turn
boundary (turn_context.reset_consolidation_failures, guarded via getattr so
plugin memory stores without the method are a no-op).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
- replace() and remove() now return entry previews and current_entries
when no entry matches old_text, matching the multi-match and add-limit
error behavior
- add() limit error also now returns previews for consistency
- Agent can self-correct after a failed replace/remove instead of looping
blindly until turn budget is exhausted with no user response
Follow-up to the per-tool availability derivation: `_snapshot_toolset_checks`
and `_evaluate_toolset_check` had no remaining callers once the four
availability surfaces switched to `_toolset_has_exposable_tools`. Remove both,
drop the no-op `quiet` param from the new helper, and document why
`_toolset_checks` is still written (banner.py reads it via TOOLSET_REQUIREMENTS
to classify unavailable toolsets as lazy-init vs disabled).
Doctor and banner used the first check_fn registered for a toolset, so
desktop-only read_terminal gated the whole terminal toolset even though
terminal and process still expose at runtime.
Fixes#54820
egilewski found the prior sink gate was transient: it only applied while
PluginManager executed register(ctx). A plugin could defer a direct
registry.register(..., override=True) to a post-load callback/thread, after
the scope was cleared, and still replace a built-in.
Make authorization durable by binding it to where the handler is DEFINED
(handler.__globals__['__name__']) rather than to call timing. At load, each
plugin's module namespace is mapped to its allow_tool_override opt-in in a
table that is never cleared. The sink resolves the handler's owning plugin
module and rejects an override from any plugin namespace without opt-in,
regardless of when or on which thread the call happens. Plugin namespaces
with no recorded policy are treated as not-opted-in (fail-closed). Built-in
and MCP handlers live outside the plugin namespace and are unaffected.
Adds a regression test for the delayed/post-load direct-registry override.
The opt-in gate lived only in PluginContext.register_tool, so a plugin
could bypass it by importing tools.registry and calling
registry.register(..., override=True) directly. Enforce the same gate at
the sink: during plugin load, the registry rejects an override from a
plugin without operator opt-in regardless of the path taken. Built-in and
MCP registrations (no active plugin scope) are unaffected.
Adds a regression test covering the direct-registry bypass.
The standalone send path (_send_telegram, used by the send_message tool,
cron delivery, and out-of-process callers) chunked the *raw* message on
UTF-16 length, then formatted and sent the result un-rechunked. MarkdownV2
escaping inflates the text (`!`/`.`/`-` -> `\!`/`\.`/`\-`), so a
4096 UTF-16-unit raw message can become ~8192 units once formatted and gets
rejected by Telegram as 'Message is too long'.
Move all text chunking into _send_telegram, after formatting: split the
formatted MarkdownV2/HTML text on UTF-16 length so every send is <=4096,
with per-chunk plain-text fallback and thread-not-found retry preserved.
Media attaches after all text chunks. (#28557)
Defense-in-depth follow-up to the runtime guard added in the previous commit.
Models on headless hosts (Railway / Fly / Docker / fresh VPS) without any ACP
CLI installed occasionally hallucinate ``acp_command="copilot"`` from the
schema description, despite the explicit "Do NOT set" instruction. The runtime
guard prevented the crash but the model still wasted a tool turn and got an
opaque silent fallback.
This commit removes the temptation at its source: ``_build_dynamic_schema_overrides``
now strips ``acp_command`` and ``acp_args`` from both the top-level and per-task
schemas when none of the known ACP CLIs (``copilot``, ``claude``, ``codex``) are
detectable on PATH. The model literally never sees the fields, so it cannot
pass them.
The runtime guard from the previous commit stays in place as defense-in-depth
for internal callers, tests, and any future code path that bypasses the schema.
``_acp_binary_available`` is intentionally NOT cached: ``shutil.which`` is
cheap, and avoiding the cache means the schema reacts to mid-session installs
without requiring a process restart.
Tests:
- ``test_schema_prunes_acp_command_when_no_acp_binary``
- ``test_schema_keeps_acp_command_when_binary_available``
- ``test_acp_binary_available_checks_known_clis``
Full ``test_delegate.py`` suite: 136/136 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a model passes `acp_command="copilot"` (or any other binary name) in a
`delegate_task` tool call, `_build_child_agent` unconditionally sets
`effective_provider = "copilot-acp"`, which routes the subagent through
`CopilotACPClient`. That client spawns the named binary via subprocess; if it
isn't on PATH, every retry raises RuntimeError and an asyncio cleanup race
during error delivery can take the entire gateway down.
This is a real failure mode on headless deploys (Railway / Fly / VPS / Docker)
where `copilot` / `claude` / etc. aren't installed. The schema does say
"Do NOT set unless the user explicitly told you an ACP CLI is installed,"
but models occasionally pass it anyway — particularly for X (Twitter) search
prompts where Grok seems to associate ACP with "search assistance."
Reproduction:
- Headless install (no `copilot` binary on PATH)
- Set provider to xai-oauth + model grok-4.3
- Telegram prompt: "Search X for crypto twitter trends"
- Grok decides to delegate and passes `acp_command="copilot"`
- Subagent crashes 3x, gateway crashes on the 3rd retry teardown
Fix: validate the binary exists on PATH via `shutil.which` before honoring
the override. If missing, log a warning and fall through to the parent's
default transport. No behavior change when the binary IS present (covered
by `test_build_child_agent_honors_acp_command_when_binary_present`).
Tests:
- `test_build_child_agent_ignores_acp_command_when_binary_missing`
- `test_build_child_agent_honors_acp_command_when_binary_present`
Verified on Python 3.11 (macOS) and 3.12 (Debian 13 container).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concurrent ACP sessions run on a shared ThreadPoolExecutor (max_workers=4).
Each _run_agent mutated the process-global os.environ["HERMES_INTERACTIVE"]
and restored it in finally, so one session's restore could clobber another's
set mid-run — dropping the second session onto the non-interactive
auto-approve path, executing a dangerous command without the approval
callback firing (GHSA-96vc-wcxf-jjff).
Replace the env-var flag with a thread/task-local contextvar in
tools.approval. The two HERMES_INTERACTIVE read sites in approval.py now go
through _is_interactive_cli() (contextvar-first, env fallback for legacy
single-threaded CLI callers). The ACP executor sets the contextvar instead
of os.environ; the existing contextvars.copy_context() wrapper isolates each
session's write.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
delegate_task's per-task completion display emitted lines like
"✓ [1/3] Research done (17.92s)" via a bare print(). Under ACP (and any
headless JSON-RPC stdio host where AIAgent routes human output to stderr
via a custom _print_fn), these landed on stdout and corrupted the
protocol frame stream, surfacing as "Failed to parse JSON message: ✓
[3/3] …" in the ACP adapter.
Add _emit_parent_console() which prefers parent_agent._safe_print (the
same hook AIAgent uses for every other user-facing print) and falls back
to print() only when no router is wired up or it raises. CLI behavior is
unchanged.
The PR's other fix (preset toolset expansion) is already covered on main
by _expand_parent_toolsets(), so only the stdio-safe printing change is
salvaged here.
Batch delegation returned each subagent's full final_response verbatim
into the parent's context. A fan-out of N children could dump 60k+ tokens
at once, blowing the parent's context window and — on rate-limited
providers — triggering a compression/429 death spiral (429 misread as
context-too-large -> window step-down -> retry loop -> conversation dies).
Cap each summary against the parent's *remaining* context headroom split
across the batch (not a magic char count). When trimming, mirror the
web_extract convention: spill the full text to cache/delegation (mounted
into remote backends via credential_files._CACHE_DIRS) and return a
head+tail window (75/25, line-snapped) plus a footer with the exact
read_file offset to page the omitted middle. Both the subagent's opening
AND its closing (outcomes / files-changed / issues, which live at the end)
survive in-context, and nothing is lost — the parent can read_file the
full version on any backend.
delegation.max_summary_chars (default 24000) is a static ceiling layered
on top as belt-and-suspenders for models that ignore 'be concise'; 0
disables it. Child prompt tightened to lead with outcomes / bullets.
Co-authored-by: rc-int <rcint@klaith.com>
Three independent security-scanner hardenings, re-homed onto the current
shared threat-pattern architecture (tools/threat_patterns.py):
- approval.py: add bash/sh/zsh/ksh heredoc to DANGEROUS_PATTERNS. The
existing heredoc pattern only covered python/perl/ruby/node, so
`bash <<'EOF' ... EOF` ran arbitrary shell — including exfil pipelines
whose inner commands don't individually match a pattern — with no prompt.
- threat_patterns.py: apply unicodedata.normalize("NFKC", ...) before
pattern matching so full-width / compatibility homographs (e.g.
`cat ~/.hermes/.env`) are folded to ASCII and no longer bypass the
keyword scanners. Invisible-char detection still runs on the raw content
first (NFKC can strip those codepoints).
- code_execution_tool.py: add CREDS/BEARER/APIKEY to _SECRET_SUBSTRINGS so
vars like HERMES_LLM_CREDS, API_BEARER, MY_APIKEY are scrubbed from the
sandbox env. PASS was intentionally dropped from the original proposal —
it false-positives on BYPASS_CACHE / COMPASS_DIR / PASSENGER_HOST while
PASSWORD/PASSWD already cover the credential cases.
The original PR also proposed a 'synonym' injection pattern block
(overlook/forget/set aside/bypass/discard + developer-mode); dropped here
because it false-positives on ordinary AGENTS.md/SOUL.md prose ("don't
forget to follow the rules", "run in developer mode"), exactly the
bossy-English class threat_patterns.py is documented to avoid.
Salvaged from #9028.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
patch_tool extracts V4A patch paths so _check_sensitive_path can refuse
writes to /etc/*, /boot/*, etc. before they reach the low-level file ops.
The extraction regex had two gaps:
1. `*** Move File: src -> dst` was never extracted (regex only matched
Update/Add/Delete), so a Move targeting /etc/crontab skipped the
pre-check and fell back on the narrower file_operations deny list.
2. The regex required `\\s+` after `***` but patch_parser uses `\\s*`, so
`***Update File: /etc/hosts` (no space) parsed + applied while
skipping the check.
Loosen the leading whitespace to \\s* and add a Move regex that checks
both endpoints. Move endpoints also run through the same '..' traversal
rejection as the other V4A headers (closes the sibling gap on current
main, which gained that traversal guard after this PR was opened).
The gateway-lifecycle guard's hermes-CLI pattern required `hermes`
and `gateway` to be adjacent, so a profile flag slipped the agent
past it: `hermes -p ade gateway restart` was not flagged. That is the
exact form from the 2026-04-11 ade-profile self-kill loop. Allow an
optional run of global flags (`-p ade`, `--profile ade`, multiple
flags) between `hermes` and the gateway subcommand.
launchctl self-termination is already covered on main by #33071; this
narrows the only remaining real gap.
Ensure Windows desktop and local terminal teardown kill full process trees so Git Bash descendants cannot survive wrapper exits and accumulate across retries.
Two robustness gaps from the #54843 truncate-store path:
- _store_full_text wrote the full clean page to cache/web with no upper
bound (path.write_text(content)); a multi-MB page → unbounded per-extract
disk write. Cap at MAX_STORED_TEXT_CHARS (2MB, the pre-truncate-store
refusal ceiling) with a marker when capped.
- The truncation footer told the model 'read_file ... offset=<line>' — a
literal placeholder it had to guess. Compute the real starting line of the
omitted middle (head line count + 1) so the first read_file lands in the gap.
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.
Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.
Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.
Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
Apply naqerl's review comments on PR #38388:
- Hoist `from hermes_cli.goals import judge_goal` to module-level
imports so an import failure surfaces at module init, not lazily
on the first goal-mode completion (no circular import: hermes_cli
package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
The verdict check and its rejection `return tool_error(...)` now
live outside the handler, so a failure there can no longer be
swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.
Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.
This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.
Fixes#38367
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
Add durable public-URL output and URL-based chaining to xAI Grok Imagine:
- Store generated media on files-cdn with permanent public HTTPS URLs
(public_url: true, no expiry by default).
- Chain by URL: generate -> edit -> extend each take a prior result's
public HTTPS URL (or a data URI / local file for inputs).
- Add provider-specific xai_video_edit and xai_video_extend tools.
- Image generation: public-URL/storage output, multi-reference edits,
and ~/ local-path support for image edits.
Credentials use xAI Grok device-code OAuth (separate PR).
* feat(web_extract): truncate-and-store instead of LLM summarization
web_extract no longer runs an auxiliary LLM over scraped pages. The extract
backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate-
stripped markdown, so we return it directly: pages within a char budget
(default 15000, web.extract_char_limit) come back whole; larger pages get a
head+tail window plus an explicit footer giving the stored full-text path and
the read_file call to page through the omitted middle. The full clean text is
written to cache/web (mounted read-only into remote backends like the other
cache dirs), so nothing is lost.
Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs
dropped) while real http(s) image URLs are preserved as links so the agent can
still web_extract/vision_analyze them.
Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model
+ _resolve_web_extract_auxiliary. context_references._default_url_fetcher is
updated to the truncate path and its stale data.documents shape read is fixed
to results (it was silently returning empty).
Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s ->
15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer
recoverable from stored full text on every truncated page). web_search is
unchanged.
No own scraper added; no changes to web_search.
* fix(web_extract): add char_limit to execute_code web_extract stub
The new web_extract char_limit param must appear in the code_execution_tool
_TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params
fails — the stub schema must cover every real schema param.
Wire the salvaged _safe_command_timeout() guard into the surviving
open-timeout call site. _get_open_command_timeout() feeds the
browser_navigate 'open' path; this closes the last call site that
could observe a None timeout from a torn cache (#14331), since the
original PR's max(_get_command_timeout(), 60) site no longer exists
on main (now routed through _get_open_command_timeout).
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.
The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:
- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
encode/resize/dimension-check off the caller's loop, sized to the host's
usable core count with NO fixed ceiling — the cap tracks the actual
exhausted resource (cores), not a magic number. Excess encodes queue on
the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
(honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.
Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.
Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).
It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.
Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.
Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
When a Camofox browser tab is garbage collected (idle timeout, browser
recycle), the held tab_id becomes stale. The next browser_navigate call
hits /tabs/{stale_id}/navigate -> HTTP 404 -> unhandled HTTPError.
Catch the 404 in camofox_navigate, clear the stale tab_id, and create a
fresh tab via _ensure_tab. The agent recovers transparently without
requiring a session restart.
Other tab operations (snapshot, click, type, etc.) use the same pattern
but only fail if the tab dies between successful calls — much rarer.
The navigate fix covers 95%+ of cases since navigate is always the entry
point.
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().
This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.
Fixes#40843
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.
Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.
Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.
Fixes#20476
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`. This caused a 400
Bad Request on every `browser_navigate` call.
The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.
Fixes#37960
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.
Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
(tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
(_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
chicken-and-egg trap (provider never loaded on a sealed venv, so
ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
lazy-covered-extras contract test (excluded from [all] by policy).
Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.