A plugin-registered WebSearchProvider with no built-in provider credentials
must light up web_search / web_extract and be discoverable by the backend
selectors. Covers check_web_api_key(), _get_backend(), _is_backend_available()
registry delegation, per-capability extract selection (#32698), and that the
web_search / web_extract tool registry entries are not filtered out.
Tests contributed by @m0n5t3r (PR #28652, issue #28651).
Asserts the behavior contract that run_one_job installs a profile secret
scope around run_job under multiplexing (so resolve_runtime_provider's
get_secret does not fail-close with UnscopedSecretError) and tears it
down afterward. Mutation-verified: fails on unmodified main with the
exact UnscopedSecretError, passes with the fix.
Fold the xAI video credential-read guard into the same shared
agent.file_safety.raise_if_read_blocked chokepoint this PR introduces for
the image providers, so the whole image+video bug class is covered by one
enforced boundary. Consolidates the parallel salvage of #57695 (xAI
image+video) into this PR; #57727 is now redundant and will be closed.
- video_gen/xai: guard _image_ref_to_xai_url and _video_ref_to_xai_url
(the video image + video byte-read chokepoints) via the shared helper.
- Regression tests: symlinked auth.json with .png/.mp4 names are blocked
across both video read paths (mutation-checked).
Follow-up to the per-provider guards. Three improvements from review:
1. Extract agent.file_safety.raise_if_read_blocked() as a single shared
chokepoint and route the OpenAI, OpenRouter, and (newly) xAI image
providers through it, replacing the 3x-duplicated inline try/except.
Fixes the whole bug class: xai/_xai_image_field read a model-supplied
local path via open() with no guard — the same vulnerability the PR
fixed for OpenAI/OpenRouter, in a sibling provider it missed.
2. Strengthen the regression tests from pass-on-any-ValueError to true
security invariants: spy open()/read_bytes() and assert the blocked
credential is NEVER read; add negative controls (legit local image
still loads; remote/data: URIs pass through unguarded) so a
block-everything regression can't pass.
3. Guard is best-effort by design (defense-in-depth, not a security
boundary) — documented on the shared helper.
- agent/file_safety.py: raise_if_read_blocked()
- plugins/image_gen/{openai,openrouter,xai}: route through helper
- tests: no-read spies + negative controls across all three providers
Five follow-ups to #57659 from post-merge review:
1. install.ps1: gateway scheduled-task re-enable now runs in a finally
(a thrown Remove-Item/uv venv failure previously stranded the user's
gateway autostart disabled), and tasks that were already disabled
before the install are no longer blindly re-enabled.
2. The venv-python holder guard is no longer bypassed by plain --force
(which the desktop bootstrap passes on every update while its lock
probe only checks hermes.exe/app.asar). New explicit --force-venv is
the escape hatch; --force keeps bypassing only the hermes.exe shim
guard.
3. _detect_venv_python_processes now also catches uv/base-interpreter
trampolines whose exe is outside the venv, via cmdline (venv path or
'-m hermes_cli.main' tied to this install root) and cwd.
4. Missing venv python is now UNHEALTHY on managed installs
(.hermes-bootstrap-complete / .update-incomplete markers) so the
repair lane runs instead of 'Already up to date!'; the repair branch
recreates the venv first when it's gone entirely. Dev checkouts keep
reporting healthy.
5. install.ps1 comment no longer claims a Startup-folder disarm the
code doesn't perform (logon-only, not a mid-install respawner).
The salvaged tests from #53754 predate _handle_vision_analyze becoming
async and the native fast path; await the handler and force the legacy
aux path so the model-resolution assertion is actually exercised.
_handlers for vision_analyze and video_analyze read model name from
config.yaml (auxiliary.vision.model / auxiliary.video.model) before
falling back to AUXILIARY_VISION_MODEL / AUXILIARY_VIDEO_MODEL env
vars. Matches the existing config-first pattern for timeout and
temperature in the same file.
Fixes#53749
A custom:<name> main provider resolves at runtime to the bare provider id
"custom". In the vision auto-detect chain, the main-provider branch called
resolve_provider_client("custom", ...) WITHOUT explicit_base_url/api_key,
so it returned (None, None) ("no endpoint credentials found") and the whole
chain fell through to OpenRouter/Nous. A user on a custom endpoint with no
aggregator configured then got "No LLM provider configured for task=vision
provider=auto" on every image, even though their main model fully supports
vision.
Recover the live endpoint that set_runtime_main() records each turn
(_RUNTIME_MAIN_BASE_URL/_API_KEY/_API_MODE) and forward it to Step 1, with
a fallback to _resolve_custom_runtime() for non-gateway callers. Mirrors the
existing explicit-base_url branch directly above.
Adds TestResolveVisionCustomProvider covering custom, custom:<name>, and the
no-runtime fallback path.
When model.provider is set to custom:<name>, _supports_vision_override()
previously tried only the runtime provider key ('custom') and the raw
config value ('custom:my-proxy'). It did not try the stripped name
('my-proxy'), which is the actual key under providers: in config.yaml.
This caused native image routing to fall back to text mode even when the
user explicitly declared supports_vision: true on the named provider's
model entry.
Fixes#39963
A user with tts.openai.model set to a direct-OpenAI model (e.g. tts-1-hd)
but no VOICE_TOOLS_OPENAI_KEY/OPENAI_API_KEY (or with tts.use_gateway)
routes TTS through the managed Nous audio gateway, which only proxies
gpt-4o-mini-tts. The request 400s with:
VALIDATION_ERROR: Unsupported managed OpenAI speech model
{'model': 'tts-1-hd', 'supportedModels': ['gpt-4o-mini-tts']}
_resolve_openai_audio_client_config now reports whether it resolved the
managed gateway; _generate_openai_tts coerces the model to a
managed-supported one (logging a warning that points at the direct-key
escape hatch) unless the user redirected base_url to their own endpoint.
Direct-key users keep their tts-1/tts-1-hd preference unchanged.
Follow-up to #57507: .ENV / .Env.local on case-insensitive filesystem
mounts slipped past the guard. Lowercase the name before matching and
add a regression test. Addresses egilewski's open review note.
`_dispatch_sync` gathers the mautrix per-event handler tasks with a bare
`asyncio.gather(*tasks)`. Without `return_exceptions=True`, the first handler
that raises aborts the gather, so the sibling events in the same sync response
are dropped unprocessed — the exception propagates up to the sync loop, which
logs a single "sync error" and moves on. The invite/redaction gathers a few
lines above already use `return_exceptions=True`.
Use `return_exceptions=True` and log each failing handler, so one bad event no
longer takes out the rest of its batch and per-event failures stay visible.
Regression test: a batch with one failing and one succeeding handler no longer
raises, the good handler still runs, and the failure is logged (mutation-
verified — reverting re-raises RuntimeError out of _dispatch_sync).
Source: https://github.com/NousResearch/hermes-agent/pull/52346
Related prior work: https://github.com/NousResearch/hermes-agent/pull/39462
Related prior work: https://github.com/NousResearch/hermes-agent/pull/27426
Maintainer direction: https://github.com/NousResearch/hermes-agent/pull/52346#issuecomment-4854881612
Remove acp_command and acp_args from the model-facing delegate_task schema and
dispatch paths. Child agents can still use ACP subprocess transport when it
comes from trusted delegation config or parent inheritance, but a model tool
call can no longer choose the command or arguments that reach child
construction.
This is salvageable because the risky boundary is model control over child ACP
transport, not ACP itself. The patch follows the maintainer direction from the
source discussion by preserving trusted ACP configuration and prior integration
work while removing the untrusted tool-call fields from both top-level and
per-task delegate inputs.
Reproduced on main by passing acp_command through delegate_task and observing it
reach _build_child_agent. Verified after the fix that model dispatch strips the
hidden top-level fields and per-task hidden fields are ignored before child
construction.
Co-authored-by: Carlosian <claudlos@agentmail.to>
Co-authored-by: ssiweifnag <120658181+ssiweifnag@users.noreply.github.com>
Co-authored-by: nikshepsvn <23241247+nikshepsvn@users.noreply.github.com>
Replace the exact-filename frozenset with _is_sensitive_filename()
that matches .env plus any .env.<suffix> variant. This covers
shorthand suffixes like .env.prod that the previous enumeration
missed.
Add test_sensitive_env_suffix_variants_blocked regression test
covering .env.prod, .env.dev, .env.staging.local, and .env.ci.
Addresses review feedback from egilewski on PR #57507.
The dashboard Files tab could list, read, and download .env files
containing API keys when running with a bind-mounted Hermes home
directory (e.g. docker run -v ~/.hermes:/opt/data).
Add _SENSITIVE_FILENAMES frozenset and filter these from
list_managed_files(), read_managed_file(), and download_managed_file().
Return 403 for direct read/download attempts on sensitive files.
Fixes#57505
_resolve_media_to_data_urls's ad-hoc _MEDIA_TAG_RE matched any bare
token after MEDIA: (no absolute-path anchor) and read the resolved
path directly with no denylist. A relative/traversal path like
MEDIA:../../../../etc/passwd.png slipped through, and any image-
suffixed file the process could read (including under ~/.ssh, ~/.aws,
etc.) was base64-inlined into the API response if its path merely
appeared in the model's own final reply text.
Every other platform adapter's MEDIA: handling already goes through
two shared primitives in gateway/platforms/base.py:
- MEDIA_TAG_CLEANUP_RE, which anchors the path to ~/, /, or a
Windows drive letter plus a known deliverable extension.
- validate_media_delivery_path, which resolves symlinks and rejects
paths under the credential/system-path denylist.
Reuse both here instead of the local unanchored pattern and naive
Path().expanduser() resolution.
browser_cdp's frame_id (OOPIF) path returned early via
_browser_cdp_via_supervisor before _browser_cdp_private_guard ever ran,
unlike the stateless path a few lines below. A model that navigated a
cloud browser to a private/internal URL could still read page content
by passing frame_id, bypassing the same SSRF/private-page boundary
already enforced on Runtime.evaluate, Page.navigate, and other raw CDP
calls.
Apply the same guard call used by the stateless path before dispatching
to the supervisor, so both routing modes share one boundary.
Root-causes the July 2026 Windows incident chain (locked _brotlicffi.pyd /
_sodium.pyd during install, then 'No module named annotated_doc' with
'hermes update' insisting 'Already up to date!'):
- hermes update: probe venv core imports even when the checkout is current;
a half-updated venv (dep sync killed mid-flight by a locked .pyd) is now
detected and repaired instead of being reported as up to date
- hermes update (Windows): after pausing gateways, refuse to mutate the venv
while other processes run from the venv interpreter (the Desktop backend
runs as python.exe so the hermes.exe shim guard never saw it); --force
keeps the old behavior
- install.ps1 venv stage: disarm gateway autostart Scheduled Tasks before
the kill sweep (they respawn the gateway inside the kill->delete window),
make the sweep a bounded loop requiring 3 clean passes, and rename-then-
delete the old venv (a rename succeeds even with mapped DLLs) with stale-
dir cleanup on the next run
- desktop updater: 'venv shim still locked after 15s' now ABORTS the update
hand-off (restarting our backend, surfacing the holder to the user)
instead of 'proceeding anyway (force)' into guaranteed venv corruption;
the unlock wait also re-kills respawned backends each poll tick
The routing-heal added to get_or_create_session calls
SessionDB.get_compression_tip; the stale-guard suite's bare MagicMock db
returned a Mock the heal then assigned as session_id, failing JSON
serialization. Model the real contract (a non-compressed session's tip is
itself) so the heal is a correct no-op.
Inbound Telegram/WeChat/Discord messages are written by the background
gateway, not the desktop websocket that drives local chats. Without
explicit polling the messaging sidebar and the open transcript stay
frozen until the user manually refreshes.
Desktop:
- MESSAGING_POLL_INTERVAL_MS (10 s): interval poll of the messaging
session list so new platform sessions surface automatically.
- ACTIVE_MESSAGING_SESSION_POLL_INTERVAL_MS (5 s): poll the currently-
viewed messaging transcript and re-hydrate the chat state when the
FNV-1a signature changes (hash covers role + timestamp + content).
- sameCronSignature now compares lineage_root_id / source / profile /
preview / message_count / last_active / ended_at so stale previews
and activity times are no longer silently ignored.
- sessionMatchesStoredId helper de-dups the id / _lineage_root_id check.
- refreshMessagingSessions exposed from useSessionListActions so the
controller can use it in the poll effect.
Gateway:
- SessionStore._compression_tip_for_session_id: look up the latest
compression continuation for a session id.
- SessionStore._heal_compression_tip_locked: rewrite a stale entry to
the compression child before returning it, so a restart or failed send
no longer leaves the store pinned to the compressed parent.
Co-authored-by: lawyer112 <lawyer112@users.noreply.github.com>
Follow-up to the salvaged early-exit retry fix (#35617): the debug-browser
launch path was fire-and-forget (stderr to DEVNULL, no logging), so every
platform failure — Windows singleton forward to an existing instance, bad
profile dir, missing shared libraries, policy blocks — collapsed into the
same unactionable 'port 9222 isn't responding yet' message and debug
reports contained nothing.
- launch_chrome_debug() returns a structured ChromeDebugLaunch with
per-candidate attempts (state, exit code, stderr tail)
- browser stderr is captured to <hermes_home>/chrome-debug/launch-stderr.log
- clean exit (code 0) without the port opening is detected as Chromium's
single-instance forward and produces a targeted user hint to close all
running instances of that browser
- crash exits surface the stderr tail (e.g. missing libnspr4.so)
- every spawn/exit is logged to agent.log so hermes debug share captures it
- CLI (/browser connect) and TUI/desktop (browser.manage) both print the hint
* feat(desktop): CLI/dashboard parity — skills hub browser, MCP test/toggle/catalog, maintenance ops, log filters
Brings desktop GUI to parity with hermes skills/mcp/doctor/backup/debug-share/
curator/memory CLI commands and the dashboard's System + Skills-hub pages:
- Skills page: new Browse Hub tab (search official/GitHub/community sources,
preview SKILL.md, security scan verdicts, install/update with live action log)
- MCP settings: connection test (tool listing), per-server enable/disable
toggle, and a Catalog tab installing Nous-approved MCP servers with env prompts
- Command Center: new Maintenance section (doctor, security audit, backup,
debug share links, curator status/pause/run, memory file status + reset)
- Command Center system logs: file (agent/errors/gateway/desktop), level, and
substring filters instead of a fixed agent.log tail
- hermes.ts API client + types for all the above; en/zh locale strings (ja and
zh-hant inherit via defineLocale)
* feat(desktop): backend model catalogs in toolset config — hermes tools parity
Completes the `hermes tools` parity gap: after picking an image/video
generation backend the CLI runs a model picker (e.g. FAL's multi-model
catalog with speed/strengths/price); the desktop toolset drawer now has the
same flow as a radio-card list.
- web_server: GET /api/tools/toolsets/{name}/models (catalog + current +
default for the active or named provider row) and PUT .../model
(validated write to image_gen.model / video_gen.model), reusing the CLI's
plugin catalog helpers so GUI and `hermes tools` stay in lockstep
- desktop: ModelCatalogPicker in ToolsetConfigPanel — per-model cards with
speed/strengths/price, in-use + default badges, disabled until the
backend is the active one; provider selection now mirrors is_active
locally so the catalog unlocks without a refetch
- tests: 3 backend endpoint tests (catalog shape invariants, persist +
validation), 2 component tests, 2 API-contract tests; en/zh strings
OpenCode Go serves minimax/qwen via Anthropic Messages (base URL without
/v1 — the SDK appends /v1/messages) and glm/kimi/deepseek/mimo via OpenAI
chat completions (base URL WITH /v1). The runtime stripped /v1 for
anthropic-routed models, and the TUI/desktop + gateway persisted that
stripped URL to model.base_url. Every later chat_completions model then
POSTed to https://opencode.ai/zen/go/chat/completions — a 404 (the
marketing site). Result: only minimax worked; glm/deepseek/kimi all 404ed.
- New normalize_opencode_base_url(): symmetric /v1 normalization —
strip for anthropic_messages, re-append for chat_completions /
codex_responses on opencode.ai hosts (heals persisted stripped URLs;
custom proxy overrides untouched)
- Applied at all three former one-way strip sites (resolve_runtime_provider
x2, switch_model)
- opencode_model_api_mode: all Qwen models on Go AND Zen now route via
/v1/messages per current published endpoint tables (previously only
qwen3.7-max on Go — qwen3.6-plus etc. would 404 the same way)
- Catalog refresh: Go gains deepseek-v4-pro/flash, glm-5.2,
kimi-k2.7-code, minimax-m3, qwen3.7-plus; Zen gains glm-5.2,
kimi-k2.7-code, minimax-m3, qwen3.7-plus
Reported by IndieSuperhuman on X: opencode-go 404s for any model other
than minimax.
A single-model Hermes agent never sends temperature; the provider default
applies. MoA hardcoded reference_temperature=0.6 / aggregator_temperature=0.4,
and the coercion float(preset.get(key, 0.6) or 0.6) made unset IMPOSSIBLE to
express: absent, null, empty, and even an explicit 0 all collapsed to the
baked-in default. Every MoA advisor and aggregator therefore ran at 0.6/0.4
while the same model running solo used the provider default — silently
skewing solo-vs-MoA comparisons and overriding provider-tuned defaults.
- moa_config normalization: temperatures coerce to None when absent/blank/
invalid (new _coerce_float_or_none); explicit values incl. 0 honored.
- moa_loop: _preset_temperature() resolves preset values; None flows to
call_llm, which already omits the parameter when None (same contract as
max_tokens). Aggregator still inherits the acting agent's own configured
temperature when the preset doesn't pin one.
- conversation_loop (context-mode MoA): same resolution, no more hardcoded
0.6/0.4 at the call site.
- DEFAULT_CONFIG preset + web_server payload models + docs updated: unset
is the default, pinning stays available.
Follow-up to the per-site strips from the review gate. The two copy-site
strips are correct but positional — a copy site added after the assembly
loops would re-leak _db_persisted into the child-session flush. Add a single
terminal sweep (_strip_persistence_markers) run once on the fully-assembled
compressed list so the invariant 'no compacted message leaves compress()
carrying a persistence marker' is structural, not dependent on copy-site order.
- agent/context_compressor.py: _strip_persistence_markers() called before
compress() returns; helper docstring notes the sweep is the authoritative guard
- tests/agent/test_context_compressor.py: structural regression — neuter the
per-site helper to a leaking copy, assert the terminal sweep still strips
- tests/run_agent/test_compression_persistence.py: pin the fixture assumption
behind the exact-equality row-count assertion
Shallow messages[i].copy() during context compression propagated the
_db_persisted marker from cached gateway incremental flushes into the
post-rotation compressed list. _flush_messages_to_session_db then skipped
every row when writing to the new child session, so gateway restarts
lost the compacted transcript (severe amnesia).
Strip the marker in _fresh_compaction_message_copy() and add regression
tests for rotation flush + compressor assembly.
Fixes#57491
The reaction-guard regression test defined a local _should_react lambda and
asserted it against itself — a tautology that would stay green even if the
production guard at _handle_slack_message reverted to (is_dm or is_mentioned),
re-introducing the unmentioned-MPIM reaction spam this PR fixes.
Replace it with a shared _reaction_guard helper plus a source-introspection
test that pins the production expression: asserts (is_one_to_one_dm or
is_mentioned) is present and (is_dm or is_mentioned) is absent. Mutation-checked
— reverting the adapter guard now fails the test.
Follow-up self-review finding on the salvage of #57339.
Group DMs (MPIMs) were classified as DMs and thereby exempted from every
operator control that shared surfaces are supposed to honor: allowed_channels,
require_mention, strict_mention, free_response_channels, and the reaction
guard. Symptom: the bot added 👀/✅ to unmentioned MPIM
messages and still invoked the agent (which then returned NO_REPLY) instead of
the gateway dropping the event before model execution. Removing an MPIM from
allowed_channels did not disable it.
Root cause is the DM classification at adapter.py:
is_dm = channel_type in {"im", "mpim"}
used for BOTH routing exemptions and reaction gating. An MPIM is a shared
surface (multiple humans can see and trigger the bot), not a private 1:1 DM,
so it must be gated like a channel.
This behavior was introduced/reinforced by a trail of Slack group-DM PRs:
- #4633 fix(slack): treat group DMs (mpim) like DMs + reaction guard
- #54632 fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
- #54663 fix(slack): group DMs work OOTB + reinstall nudge
#54632/#54663 correctly made MPIM messages *reachable*; #4633 over-reached by
giving them the DM mention/reaction *exemptions*. This corrects only that
over-reach.
Fix (minimal): introduce `is_one_to_one_dm = channel_type == "im"` and key the
two EXEMPTION sites off it instead of `is_dm`:
- mention/allowlist gating block (`if not is_one_to_one_dm and bot_uid:`)
- reaction guard (`(is_one_to_one_dm or is_mentioned)`)
`is_dm` is intentionally retained for session/thread scoping and chat_type
labeling, where treating an MPIM as a persistent multi-party conversation is
correct — only the mention/reaction exemptions were wrong.
Docs: slack.md now distinguishes 1:1 DMs (mention-exempt) from group DMs
(shared surface; obey require_mention/strict_mention/allowed_channels/
free_response_channels; reactions only when @mentioned).
Tests: +7 in test_slack_mention.py (MPIM unmentioned dropped under
require_mention and strict_mention; MPIM mentioned processed; MPIM off
allowed_channels dropped; MPIM in free_response opted in; 1:1 IM still exempt;
reaction guard drops unmentioned MPIM). Updated _would_process to model the
is_one_to_one_dm gating + strict_mention. 72 passed.
hermes debug share reads os.getenv — the invoking terminal's environment — but
launchd/systemd and the desktop-spawned `serve` backend load credentials from
~/.hermes/.env, not the login shell. A key exported in the shell but absent
from .env is invisible to the backend, yet the dump printed a bare "set",
sending support down a phantom "the key is configured" path.
This was the actual trap behind a "Desktop has no web_search / no tools"
report: FIRECRAWL_API_KEY was a shell export (so `debug share` in a terminal
read "firecrawl set") but not in .env, so the launchd backend's
check_web_api_key returned False and web_search was gated off — which a
contributor then misdiagnosed as a missing `desktop` platform registration.
The dump now annotates any key set in-process but missing from ~/.hermes/.env
with "(shell only — not in .env; managed/desktop backend may not see it)" so
the mismatch is obvious instead of hidden behind "set".
The merged webhook session-close fix (#57370, salvaging #57322) wrapped
handle_message in a try/finally — but BasePlatformAdapter.handle_message
is fire-and-forget: it spawns _process_message_background and returns
before the agent run starts. The finally-close therefore ran BEFORE
get_or_create_session created the session row, found no session_id, and
silently no-op'd — the ghost-session leak persisted on the real path.
(The shipped test masked this by stubbing handle_message with a fake
that created the row synchronously.)
Move the close to an on_processing_complete override — the lifecycle
hook the base class fires at the TRUE end of the run, on the success,
failure, and cancellation paths alike. Empirically verified through the
real fire-and-forget pipeline: before, ended_at stayed NULL; after,
ended_at is set with end_reason=webhook_complete and the row is
prunable.
Tests now stub only the runner-side _message_handler (the seam the live
gateway injects) so handle_message / _process_message_background /
on_processing_complete all run for real; adds an AsyncSessionDB-facade
coverage test for the coroutine-await branch.
Follow-up to @helix4u's #57336 salvage. Two review findings:
- W1: model-picker grouped custom-provider rows by
(api_url, credential, api_mode) but NOT extra_headers. Entries sharing a
URL+credential+api_mode yet declaring different headers (e.g. per-tenant
routing behind one proxy) collapsed into one row and probed /models with
whichever header set was seen first (order-dependent). Fold a canonical
header identity into group_key so distinct header-authed endpoints stay
separate; drops the now-dead first-non-empty merge branch.
- W2: the extra_headers stringify+None-filter comprehension existed in 5
copies (config.py x2, runtime_provider.py, model_switch.py, models.py).
Extract one shared hermes_cli.config.normalize_extra_headers primitive;
all sites now call it.
Tests: +normalize_extra_headers unit tests, +regression test proving two
same-endpoint entries with different headers stay distinct and each probes
with its own headers. 223 targeted tests pass; ruff clean.
Follow-up to the cherry-picked #31856 fix. The contributor's guard defers
idle-TTL eviction until the session store reports the session expired, so the
expiry watcher can tear the agent down and fire MemoryProvider.on_session_end()
with the live transcript. Two gaps remained:
1. Memory-leak regression for mode='none' sessions. _is_session_expired()
returns False forever for the 'none' reset policy, so the naive guard would
never idle-evict those agents — reopening the unbounded-cache leak the idle
sweep (#11565) exists to relieve. Added SessionStore.is_session_finalizable()
(a public predicate: will the expiry watcher EVER finalize this session?) and
gate the deferral on it. mode='none' agents fall through to soft eviction as
before.
2. on_session_end still dropped on the LRU-cap path. Both cache-pressure paths
(_enforce_agent_cache_cap and _sweep_idle_cached_agents) soft-evict via
_release_evicted_agent_soft, which by design does NOT fire on_session_end.
If cache pressure evicts a finalizable-but-not-yet-expired agent before it
expires, the watcher later finds no cached agent and the hook is skipped.
Added _commit_memory_before_soft_evict(): at LRU eviction, if the session is
finalizable and not yet expired, commit end-of-session extraction via the
live agent's own (fully-scoped) memory manager using commit_memory_session()
— extraction WITHOUT provider teardown, so the eviction stays soft and a
resumed turn keeps working. Skipped for mode='none' (no missed boundary to
compensate) and expired sessions (the watcher tears those down directly).
This closes#11205 for ALL eviction paths and reset policies, not just the
idle-sweep + finite-policy case, while preserving the soft-eviction
resumability contract (never calls close() on a live session).
Tests: 5 new cases in test_agent_cache.py (mode='none' still reaped, LRU-cap
commits for finalizable / skips for none, real is_session_finalizable
predicate); all mutation-checked. Contributor's original 2 tests updated to
assert the finalizable path explicitly.
The idle-TTL sweep (_sweep_idle_cached_agents) was evicting agents
as soon as they passed _AGENT_CACHE_IDLE_TTL_SECS, even when the
session hadn't expired yet. In daily-reset mode the reset can fire
hours after the last user message — evicting the agent early means
the session-expiry watcher has no agent in cache to call
on_session_end() with, so memory providers miss the live transcript.
Now the sweep checks the session store before evicting: if the
session still exists and hasn't expired, the agent stays in cache
so the expiry watcher can tear it down properly later.
When the session store is unavailable or throws, falls back to the
original eviction behavior (safe default).
Fixes: #11205
Self-review (ruff+ty lint diff = 0 net-new; 2-agent deep review) surfaced one
Warning + comment-accuracy nits; no Critical:
- W1: the local-probe TTL cache memoized None (probe failure) for 30s, so a
probe that failed during a startup race would suppress a legit retry once
the server came up. Cache only positive results — still fully bounds the
hot-path probe rate (reachable servers cache their value) while an
unreachable one re-probes on the next call. Add a regression test asserting
a None result is NOT cached (retry re-probes); mutation-verified.
- Tighten the platform-guard comment: gateway/TUI/cron already construct with
quiet_mode=True (gated by `not agent.quiet_mode`), so the guard's active job
is CLI dedup vs show_banner, not "filling the gateway/TUI gap" as originally
worded.
Verified not-issues (per review): positive-value 30s cache does not break the
reconcile-after-restart freshness contract (restart = fresh process, empty
cache); cache key is collision-safe; platform guard is correct in both
directions (no runtime path leaves platform None on a non-CLI surface).
Tests: 149 passed. ruff clean; ty 0 net-new vs base.
Desktop/dashboard WebSocket connections drop during long agent operations
(delegate_task subagents, large model outputs) when the uvicorn event loop is
GIL-starved for minutes. Root cause: uvicorn's ws keepalive ping runs on the
SAME event loop as agent turns. A single synchronous GIL-holding call on a
worker thread (a regex/scrub over a large output, or a long subagent turn)
freezes the loop, so it cannot process the incoming pong within ws_ping_timeout
and uvicorn closes an otherwise-healthy connection (#53773: 'event loop stalled
226.3s'; #48445/#50005). Loosening the timeout only raises the threshold — a
multi-minute stall sails past any finite window.
The keepalive ping exists to detect half-open connections (reverse-proxy 524,
dropped tunnels), which cannot happen on loopback: there is no network or proxy
in the path, and a dead local client tears the socket down with a real FIN/RST
that starlette surfaces as WebSocketDisconnect regardless of the ping. So on
loopback the ping provides ~no liveness value while actively killing
recoverable stalls — disable it entirely (ws_ping_interval/timeout=None).
Non-loopback (public) binds sit behind a Cloudflare Tunnel where half-open IS a
real failure mode, so the ping stays at 20/20 to detect it.
Empirically verified (real uvicorn + websockets peer): with ws_ping=None the
server never closes a silent peer during an 8s window; with the pre-fix 2s/2s
window uvicorn closes it. A genuinely-dead client still fires the
WebSocketDisconnect reap path regardless of the ping.
Note: this fixes the local Desktop case (the OP's scenario). A remote Desktop
over an authenticated public dashboard route (McCalebTheSecond's comment) keeps
the ping and needs the deeper GIL-hotspot fix — tracked separately.
Closes#53773
Salvage review of #56431 surfaced one Critical + two Warning issues; fix
them on top of the contributor's cherry-picked commits:
1. Critical — duplicate non-agentic warning on the interactive CLI. The new
agent_init warning fires on every platform, but cli.py show_banner()
already warns on CLI (richer output + /model hint), so a CLI user saw the
warning twice per startup. Guard the agent_init emit to skip platform=="cli"
— it now fills exactly the gateway/TUI gap the PR intended, no duplication.
2. Warning — vLLM error-parse regex under-matched. The patterns required a
literal space before the number, so "max_model_len: 32768", "=32768",
"(32768)", and "... is 32768" all returned None. Broaden both patterns to
accept :/=/(/ 'is' delimiters. Add a parametrized test over all delimiter
variants.
3. Warning — per-call live probe latency on local endpoints. The new
reconcile-on-hit + pre-defaults step-7 probe made every local resolution
fire a synchronous network probe (banner + /model switch + compressor
update_model each within one startup). Add a 30s in-process TTL cache
keyed by (model, base_url) around _query_local_context_length so back-to-
back resolutions reuse one round-trip; not persisted to disk, so the
reconcile freshness contract (re-probe after restart) is preserved. Add an
autouse fixture clearing the cache between tests + TTL coverage.
Tests: 148 passed (was 138). ruff clean.