`handle_401` spawned a dedup'd recovery coroutine via
`asyncio.create_task(_do_handle())` and discarded the returned task
reference. Python's event loop only keeps weak references to tasks, so
the coroutine could be garbage-collected before it called
`pending.set_result(...)`. Every concurrent caller awaiting that future
then hangs forever, and the `finally: entry.pending_401.pop(...)`
cleanup never runs — so subsequent 401s for the same key latch onto the
dead future too. Same pattern the adapter-side fixes address (#11997,
#11998, #12000, #12001, #12006).
Hold the task in a process-wide set on the manager and discard it via
`add_done_callback` once it completes. Regression test covers both the
structural invariant (task tracked, then removed on completion) and a
concurrent dedup path with a forced `gc.collect()` between the handler's
await points.
A /learn request can mix the source(s) to gather (paths, URLs, "what we
just did") with requirements that shape the skill (focus, scope, what to
omit). When a request led with a path or link, the agent fetched it and
treated the trailing prose as incidental, dropping the user's stated
focus — the symptom @GrenFX reported.
The input layer was never the cause: both CLI (split(None, 1)) and
gateway (get_command_args()) capture the full free-text argument. The
gap was in build_learn_prompt, which dumped the request as one
undifferentiated source blob.
build_learn_prompt now tells the agent the request may mix sources and
requirements in any order, that prose after a path/link is authoring
guidance to honor (not noise), and to never fetch the first source and
ignore the rest. Adds step 1b: apply every requirement to what the
SKILL.md covers, not just which sources get read. Both surfaces inherit
it; no parser change, zero tool footprint.
Two resize fixes for a steadier TUI under aggressive terminal resizing.
1. Drag-resize flicker (useMainApp): `cols` was synced to
`stdout.columns` synchronously on every 'resize' event. Each distinct
width remounts the visible transcript rows (they're keyed on cols so
yoga re-measures off live geometry), so a drag — which fires a burst of
resize events — turned into a per-tick remount storm that flickers and
stutters. Throttle the sync with a leading+trailing edge: the first
event reflows immediately (stays responsive), the rest collapse to at
most one reflow per RESIZE_COALESCE_MS (~30fps), and the trailing edge
always applies the final width so the settled layout is exact.
2. Resize-burst heal coverage (#18449): the existing ink-resize test only
exercised a single same-dimension event. Add two regressions that drive
a rapid resize *burst* (wobbling dims that settle back to the start, and
an isolated same-dimension event with no tree change) and assert the
renderer converges to a clean erased repaint — screen erased, then
content repainted after — rather than a partial diff over drifted cells.
This also relaxes the pre-existing single-event assertion, which
hard-coded the exact bytes `ESC[2J ESC[H`; the heal legitimately
interposes `ESC[3J` (erase scrollback) on some recovery paths, so all
three tests now assert the semantic invariant instead of a byte run.
Pull the inline leading+trailing resize throttle out of useMainApp into
createResizeCoalescer (src/lib/resizeCoalescer.ts) and cover it directly
with fake-timer tests: leading-edge immediacy, burst collapse to one
trailing reflow, fresh leading edge after the window, cancel() dropping a
pending reflow, and sustained-drag staying ~one reflow per interval.
Also seed lastReflow at -Infinity instead of 0 so the leading edge fires on
the first event independent of the wall clock (the inline version only
worked because Date.now() is large at runtime).
Tool call ids are used to name persisted large-result files. Treating that id as a raw path segment allowed traversal-like ids to resolve outside hermes-results even though the shell command quoted metacharacters.
Convert ids to single filename stems, preserve normal ids, and add a short hash when normalization is needed so unsafe ids do not collide silently.
Constraint: Avoid new dependencies and preserve existing tool-result paths for normal tool call ids
Rejected: Quote only the path | shell quoting does not prevent ../ path traversal
Confidence: high
Scope-risk: narrow
Reversibility: clean
Tested: source /Users/peter/hermes-agent/venv/bin/activate && pytest tests/tools/test_tool_result_storage.py -q
Tested: source /Users/peter/hermes-agent/venv/bin/activate && python -m compileall tools/tool_result_storage.py tests/tools/test_tool_result_storage.py
Tested: git diff --check
Some legitimate @bot pings were dropped because the mention gates relied on
message.mentions alone, which does not always populate raw <@ID> / <@!ID>
forms (mobile, edited, relayed messages). A bare @bot with no other text
could also spawn a fake empty-text turn.
- add _self_is_explicitly_mentioned() / _raw_mentioned_user_ids() helpers that
treat the bot as mentioned via resolved mentions OR raw content forms
- use them at the allow_bots=mentions gate, multi-agent bot filtering, the
mention-strip/mention_prefix step, and the require_mention gate
- drop bare mention-only pings (no text, no media, no injection, no backfill
context) instead of injecting a placeholder empty turn
Co-authored-by: Teknium <teknium1@gmail.com>
Registers PowerShell (.ps1/.psm1/.psd1) in the LSP server registry,
spawning PowerShellEditorServices over stdio via a pwsh/powershell
host. PSES ships as a GitHub release zip (no npm/go/pip recipe), so it
sits in the manual install tier alongside rust-analyzer and clangd.
The spawn builder resolves the module bundle from (in order) the
lsp.servers.powershell.command override, init bundlePath, the
PSES_BUNDLE_PATH env var, or <HERMES_HOME>/lsp/PowerShellEditorServices,
then launches Start-EditorServices.ps1 -Stdio with a non-interactive,
no-profile host. hermes lsp status/list report it as manual-only until
pwsh is present.
Docs and tests included.
When the summary LLM hits a 429/transient failure, _generate_summary() sets
a cooldown and returns None; compress() inserts a static fallback marker and
returns. Tokens stay above threshold, so should_compress() kept returning
True and every subsequent agent turn re-fired _compress_context() — the CLI
appeared frozen until the cooldown expired.
Add a cooldown guard to should_compress(): return False while
_summary_failure_cooldown_until is in the future. Reuses the existing float;
no new state. Manual /compress (force=True) still clears the cooldown first.
Fixes#11529
Generic provider:custom relays were force-routed to the OpenAI Responses
API whenever the model matched gpt-5*, and a stale persisted
model.api_mode=codex_responses survived /reset and upgrades. Some
OpenAI-compatible relays do not implement Responses semantics, which
surfaced as malformed function_call.name replay errors in gateway sessions.
- runtime_provider: route custom-provider api_mode through
_resolve_plain_custom_api_mode(), which drops a stale codex_responses
unless the URL is direct OpenAI/xAI
- run_agent: _provider_model_requires_responses_api returns False for
custom; direct api.openai.com / api.x.ai URLs still upgrade via
_is_direct_openai_url() / URL detection
- regression coverage for plain relays vs direct OpenAI/xAI URLs
Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
* fix(agent): drop tool_calls with empty function.name to prevent orphan 400
Salvage of #12807 by @melonboy312 — rebased onto current main (sanitizer
moved to agent_runtime_helpers), scoped to the sanitizer fix, with a
regression test that fails without it.
* fix(agent): repair (not drop) empty-name tool_calls to preserve anti-priming + prevent 400
Dropping empty-name tool_calls in the pre-call sanitizer collided with #47967,
which intentionally keeps an empty-name call paired with a synthesized
'tool name was empty' anti-priming result so weak models self-correct without
a full catalog dump. Dropping the call orphaned that result and stripped the
signal (breaking tests/agent/test_empty_tool_name_loop_dampening.py).
The actual HTTP 400 cause is an ORPHANED function_call_output (adapter drops
the empty-name function_call but keeps its output). Rename the blank name to a
non-empty sentinel instead: the call and its result stay paired, the adapter
no longer drops the function_call, no orphan, no 400 — and the anti-priming
result content the model needs is preserved.
---------
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
The store-level search_facts() shared the same raw-MATCH bug class as
_fts_candidates (FTS5 AND-joins tokens, zeroing prose recall). Route it
through FactRetriever._sanitize_fts_query via a lazy import to keep the
store->retrieval layering acyclic. Also add cyb3rwr3n to release AUTHOR_MAP.
The FactRetriever's _fts_candidates passed the raw query string directly
to FTS5's MATCH operator. FTS5 defaults to AND-between-tokens, which
means any multi-word prose query like 'what happened with the deployment
rollback' required every single token to co-occur in a fact — dropping
recall to zero on the kind of queries agents actually issue via prefetch().
Fix: add _sanitize_fts_query() that:
- tokenizes the query and drops English stopwords
- strips FTS5 operator characters per token
- OR-joins the remaining content tokens as phrase literals
For pathological inputs (all stopwords, empty), falls back to the raw
query so the caller sees zero results instead of a SQL error.
This is a pure-retrieval-quality fix — the HRR + Jaccard reranking
stages still keep precision high. Ships with 10 tests covering the
sanitizer and retrieval integration.
The init snapshot dumped functions with a line-based filter:
declare -f | grep -vE '^_[^_]'
That strips a function's *header* line (e.g. `_foo () `) but leaves the
orphaned `{ ... }` body behind, corrupting the snapshot that is sourced
before every command. Sourcing the torn snapshot runs leftover body code
and breaks subsequent commands (intermittent exit 127).
- Filter private (`_`-prefixed) functions by NAME via `declare -F` and
dump only the wanted whole definitions, so a body is never torn. Guard
against an empty name list (bare `declare -f` dumps everything).
- Treat a non-zero bootstrap exit code as snapshot-init failure, so
execution safely falls back to login-shell-per-command mode.
- Add a regression test asserting snapshot_ready stays false when
bootstrap exits non-zero.
Preserves the atomic-write ($BASHPID temp + mv -f) machinery from #38249.
The polling heartbeat's pending-update probe treated a stopped updater
(running=False) as "someone else's job" and silently reset its counter,
so a long-poll task that disappears with no reconnect in flight was never
recovered. get_me() on the general request path stays healthy, so neither
PTB's error_callback nor the connectivity probe ever fires — the gateway
keeps running but stops receiving messages indefinitely (#55769).
Detect the stopped-updater case directly in _probe_pending_updates and feed
it into the existing _handle_polling_network_error ladder, debounced over two
consecutive probes so a just-starting updater or the brief stop()->start_polling()
window of an in-flight reconnect never trips it.
background_review hardcoded enabled_toolsets=["memory", "skills"] in the
review fork's whitelist, so a skill-review fork on a profile with
memory_enabled: false still granted the LLM the built-in MEMORY.md read/write
tool — contaminating a profile that opted out of built-in memory. The flag was
already in scope (review_agent._memory_enabled). Include "memory" only when
_memory_enabled or _user_profile_enabled (USER.md also needs the tool).
Layer 1 of #54937 (the path leak) is fixed by this PR's thread-context
propagation: get_memory_dir() is already per-call on main, so once the
bg-review thread inherits the profile override its writes land in the right
profile (verified). This commit closes the remaining whitelist layer.
The inbound-media validator _is_allowed_bridge_path() checked against
IMAGE_CACHE_DIR / AUDIO_CACHE_DIR / VIDEO_CACHE_DIR / DOCUMENT_CACHE_DIR
value-imported at module load. After the base.py cache-dir getters became
per-call resolvers, the bridge writes media into the active profile's cache
while the validator still matched the frozen launch-profile constants — so
media was rejected under a profile override (multi-profile gateway).
Resolve the cache roots per-call via the get_*_cache_dir() getters and drop
the now-unused frozen value-imports. Caught by automated review on #55867.
The reachability claim that single-process multi-profile leakage is desktop-
only is incomplete. gateway/run.py:_profile_runtime_scope shows a SECOND such
runtime: the multiplexed gateway (gateway.multiplex_profiles) serves every
profile from one process, scoping each inbound turn with the same
set_hermes_home_override ContextVar the desktop uses (and the /p/<profile>/
URL prefix). The M1 (import-time path globals) and M2 (thread/executor
context) leaks are reachable there identically.
- tests/gateway/test_multiplex_credential_isolation.py: add a class driving the
skills-dir + cache-dir resolvers and a propagated worker thread under the
real _profile_runtime_scope, asserting each resolves the active profile. Sits
beside the existing credential-isolation proofs for the same topology.
- Correct the inline comments in model_tools/run_agent/async_delegation/
rich_sent_store to name both runtimes (desktop tui_gateway AND the
multiplexed gateway) instead of implying desktop is the only surface.
(ACP runs one agent per subprocess and the kanban dispatcher Popens
'hermes -p <profile>' children, so neither is an in-process multi-profile
surface; desktop + multiplexed gateway are the two confirmed ones.)
- tools/skills_hub.py: the per-call resolvers now honor a test-injected real
module attribute (patch.object(hub, 'SKILLS_DIR', ...) / monkeypatch.setattr)
before falling back to dynamic profile resolution. PEP 562 __getattr__ only
fires when no real attribute exists, so an unpatched module resolves the
active profile and a patched one respects the test's value — keeping the
existing skills_hub test seam intact (5 tests had broken).
- tests/test_profile_isolation_runtime.py: real two-profile (no-mock) suite
driving each previously-leaking site under override A then B and asserting
the active profile's path/identity is used: skills_hub paths + derived
constants + default-arg resolution, gateway cache getters (incl. the
monkeypatch-still-wins seam), rich_sent_store path, and thread/executor
context propagation (raw-thread hazard documented; primitive + _run_async
worker proven to preserve the override).
A bare threading.Thread / ThreadPoolExecutor worker starts with an empty
contextvars.Context, so the context-local profile override
(_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process
multi-profile runtimes (desktop tui_gateway) the worker then resolves
get_hermes_home() to the launch/default profile, leaking one profile's
reads/writes into another. The fix primitive (tools.thread_context.
propagate_context_to_thread, which copies the parent context) already exists;
the leaking spawns simply did not use it.
- model_tools.py _run_async: wrap the worker-thread loop runner. This is the
generic sync->async bridge for every async tool, so wrapping it here fixes
the leak for all async tools at once (verified: an async tool reading
get_hermes_home() under an override now resolves the active profile).
- run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land
in the spawning turn's profile (#54937 path).
- tools/async_delegation.py: wrap both single + batch executor.submit calls so
detached children resolve the dispatching profile's paths.
Scope: the vision CPU executor is intentionally left unwrapped — it runs pure
in-memory encode/resize and never resolves profile-scoped paths.
In single-process multi-profile runtimes (desktop tui_gateway), profile
scoping is a context-local ContextVar override, not a process env var. Three
subsystems froze their HERMES_HOME-derived paths at import time (or read
os.environ directly), pinning every later profile to whichever profile first
imported the module — a cross-profile data leak.
- tools/skills_hub.py: SKILLS_DIR/HUB_DIR/LOCK_FILE/etc. were module constants
frozen at import. Replace with per-call resolver functions; add a PEP 562
module __getattr__ so external 'from tools.skills_hub import SKILLS_DIR'
callers (all function-local) resolve dynamically with no call-site changes.
Convert default-arg bindings (HubLockFile/TapsManager) and the derived
HERMES_INDEX_CACHE_FILE constant too.
- gateway/platforms/base.py: image/audio/video/document cache-dir getters now
re-resolve via get_hermes_dir() per call, falling back to the module
constant when a test has monkeypatched it (preserves the existing test seam).
Media-delivery safe-roots already enumerate all profiles' cache dirs
(#31733), so per-profile resolution does not break delivery.
- gateway/rich_sent_store.py: _store_path() read os.environ['HERMES_HOME']
directly, bypassing the override entirely; route through get_hermes_home().
Adds gateway.platform_connect_timeout (default 30s) to DEFAULT_CONFIG and
bridges it to the internal HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT env var
at gateway startup, following the existing gateway_timeout config->env
pattern. The env var remains the manual-override escape hatch and wins if
set explicitly; otherwise config.yaml supplies the value. This closes the
issue's documentation/config-surface request (#19776 suggestion 2) on top
of the adapter ready-wait fix, so users no longer need an undocumented env
var to raise the Discord connect timeout.
Refs #19776
Dark-mode connector lines and ring outlines read too faint. Double the
two live knobs: MODE_DEFAULTS.dark.lineAlpha 0.12->0.24 and
RING_PARAMS.dark.ringAlpha 0.03->0.06. (MODE_DEFAULTS.ringAlpha is dead;
the outline is drawn from RING_PARAMS.)
The extension-less MEDIA delivery guards short-circuited on
"MEDIA: not in text and [[audio_as_voice]] not in text", so a
response carrying only [[as_document]] (an image-only reply requesting
unmodified document delivery) leaked the directive as visible text.
Add [[as_document]] to both guard conditions (_strip_media_tag_directives
and strip_media_directives_for_display) and cover it with a regression
test.
Files like Caddyfile or Makefile have no extension, so MEDIA_TAG_CLEANUP_RE
never matched them and Telegram showed the raw MEDIA: line as text. Extract
and strip validated extension-less tags via a second pass.
Follow-up to liuhao1024's #46924. Route plain-text approval replies
through the canonical /approve and /deny handlers (resolve thread, resume
typing, return localized confirmation) and deliver that confirmation back
to the user — previously a plain 'yes' resolved silently. Synthesize a
literal '/'-prefixed command so get_command_args() parses always/session
modifiers on every platform (is_command() only recognizes '/'). Add E2E
tests covering approve/deny/always/session vocab plus the no-pending and
unrelated-text fall-through cases.
When the agent is blocked waiting for a dangerous-command approval,
plain-text responses like "yes" or "approve" were being steered into
the running agent instead of being delivered to the approval handler.
This meant approval via messaging platforms (Signal, Telegram, etc.)
never succeeded — the user's response was consumed by the steer logic
and the approval timed out.
Add an early check in `_handle_active_session_busy_message` that routes
approval-like responses ("yes", "approve", "deny", etc.) to the
approval handler when `has_blocking_approval()` is true for the session.
Fixes#46866
(cherry picked from commit b37ec1e0fd0f191da47db8472bf97a8553864945)
The judge gate added for kanban_complete (Issue #38367, PR #38388) only
covers one of the two exit paths out of run_kanban_goal_loop(). The loop
treats status == "blocked" as terminal identically to "done" (and any
other status outside running/ready/done/blocked also stops the loop —
see goals.py's status dispatch). A goal_mode worker that has learned
kanban_complete is gated can simply call kanban_block(reason="anything")
to escape the loop with zero judge involvement, fully defeating the
intent of #38367's fix.
This is Issue #38696, filed as the explicit follow-up by a reviewer on
PR #38388: "kanban_complete is one way out; kanban_block is another...
A worker that learns the complete path is gated can shift to calling
block to escape the loop with the same effect."
Implements the issue's "Option B" (deterministic allowlist, no extra
judge LLM call) using the kind taxonomy that already exists in
kb.VALID_BLOCK_KINDS, rather than inventing a new judge_goal() outcome
type (judge_goal only returns done/continue/wait/skipped — there's no
"is this block legitimate" verdict to hook the issue's "Option A"
pseudocode onto without expanding the judge's contract).
goal_mode tasks may only block with kind in {dependency, needs_input} —
the two kinds that represent a genuine external blocker the worker
cannot resolve itself. `capability`, `transient`, and an unset kind are
rejected with a message directing the worker to kanban_complete instead,
which the judge now gates. Non-goal_mode tasks are completely unaffected.
Render the pet as an absolute overlay riding the bottom-right corner (just above
the status bar) instead of a full-width band that ate a whole row. It reserves
no layout rows; the transcript keeps its text clear of it responsively — a right
gutter on wide terminals (lines wrap to the pet's left) collapsing to reserved
bottom rows on narrow ones (full-width lines sit above it).
kitty fits an image to its cell rect preserving aspect, so a frame whose pixel
size isn't a whole multiple of the cell rounds up — clipping the bottom row
("clipped feet") and letterboxing a blank row. Trim each frame to its union
alpha bbox, then snap to an exact cell multiple before transmit so the sprite
hugs its box and renders full-body. (ratatui-image#57: render in multiples of
the font-size.)
The last welded composer engine. The `@`/`/` trigger state, detection
(refreshTrigger), the adapter-driven item list + its effects, popover selection,
closeTrigger, commitTypedSlashDirective, and the contentEditable chip insertion
(replaceTriggerWithChip) move verbatim into hooks/use-composer-trigger.ts behind a
hook that takes the editor refs + the two completion sources (at/slash). ChatBar's
input/keydown/keyup paths + the popover render consume the returned API; the
keydown navigation block stays in place (no key-handling restructure), and
triggerKeyConsumedRef is exposed so keyup still skips its post-consume refresh.
ChatBar 1,248 → 1,047. Behaviour-preserving: typecheck 0 errors, eslint clean,
and the composer DOM repro suite (slash-nav, enter-submit, IME composition,
trigger-popover) is green — the documented IME/caret/focus edge fixes ride along
verbatim. (The 1 attachments.test.tsx failure is pre-existing on main.)
fallback-model.ts (1,696) folded into assistant-ui/tool/fallback-model/ with
three cohesive, self-contained leaf modules extracted (verbatim moves):
- types.ts (83) — the shared tool-view types/interfaces.
- format.ts (133) — pure value formatting/parsing (isRecord, compactPreview,
clampForDisplay, prettyJson, parseMaybeObject, unwrapToolPayload, numberValue,
contextValue, formatDurationSeconds).
- targets.ts (75) — url/path/preview detection + disclosure ids (looksLikeUrl,
findFirstUrl, hostnameOf, isPreviewableTarget, toolPart/GroupDisclosureId).
index.ts (1,434) keeps the tool-specific assembler (TOOL_META, titles, the count
machinery, subtitle/detail/diff, buildToolView) and re-exports the leaf modules,
so consumers importing `./fallback-model` are unchanged (folder index resolution)
— no importer or channel edits needed. The count/result/detail helpers reach
across each other around buildToolView, so they stay together to avoid a circular
split; the three leaves are the clean cut.
Behaviour-preserving: typecheck 0 errors, eslint clean, fallback-model test 24/26
(the 2 browser_navigate title failures are pre-existing on main — `hostnameOf`
intentionally includes the pathname; verified identical on the un-split file).
Assert a journey edit leaves MEMORY.md byte-identical to MemoryStore's
own §-join (no trailing-newline drift) and round-trips through
MemoryStore._read_file, so the two surfaces can never diverge on format.
learning_mutations re-implemented the §-delimited read/write that
tools/memory_tool already owns, and its writer used a plain write_text
(truncate-then-write) — reintroducing exactly the partial-file race that
MemoryStore._write_file engineered away with atomic temp-file + rename.
Reuse MemoryStore._read_file/_write_file so the format is single-sourced,
the write is atomic against concurrent readers, and journey indices stay
aligned with the graph.
The merged #55859 left the star-map NodeContextMenu import and the
canvas onContextMenu prop out of perfectionist's required order, failing
`npm run lint` in the desktop workspace. Reorder both.
TUI /journey gets d/e with confirm + $EDITOR; desktop gets a right-click
context menu with inline edit modal. Both refresh the graph after mutation.
Extract openInEditor into the shared TUI editor helper.