`handle_401` spawned a dedup'd recovery coroutine via
`asyncio.create_task(_do_handle())` and discarded the returned task
reference. Python's event loop only keeps weak references to tasks, so
the coroutine could be garbage-collected before it called
`pending.set_result(...)`. Every concurrent caller awaiting that future
then hangs forever, and the `finally: entry.pending_401.pop(...)`
cleanup never runs — so subsequent 401s for the same key latch onto the
dead future too. Same pattern the adapter-side fixes address (#11997,
#11998, #12000, #12001, #12006).
Hold the task in a process-wide set on the manager and discard it via
`add_done_callback` once it completes. Regression test covers both the
structural invariant (task tracked, then removed on completion) and a
concurrent dedup path with a forced `gc.collect()` between the handler's
await points.
A /learn request can mix the source(s) to gather (paths, URLs, "what we
just did") with requirements that shape the skill (focus, scope, what to
omit). When a request led with a path or link, the agent fetched it and
treated the trailing prose as incidental, dropping the user's stated
focus — the symptom @GrenFX reported.
The input layer was never the cause: both CLI (split(None, 1)) and
gateway (get_command_args()) capture the full free-text argument. The
gap was in build_learn_prompt, which dumped the request as one
undifferentiated source blob.
build_learn_prompt now tells the agent the request may mix sources and
requirements in any order, that prose after a path/link is authoring
guidance to honor (not noise), and to never fetch the first source and
ignore the rest. Adds step 1b: apply every requirement to what the
SKILL.md covers, not just which sources get read. Both surfaces inherit
it; no parser change, zero tool footprint.
Tool call ids are used to name persisted large-result files. Treating that id as a raw path segment allowed traversal-like ids to resolve outside hermes-results even though the shell command quoted metacharacters.
Convert ids to single filename stems, preserve normal ids, and add a short hash when normalization is needed so unsafe ids do not collide silently.
Constraint: Avoid new dependencies and preserve existing tool-result paths for normal tool call ids
Rejected: Quote only the path | shell quoting does not prevent ../ path traversal
Confidence: high
Scope-risk: narrow
Reversibility: clean
Tested: source /Users/peter/hermes-agent/venv/bin/activate && pytest tests/tools/test_tool_result_storage.py -q
Tested: source /Users/peter/hermes-agent/venv/bin/activate && python -m compileall tools/tool_result_storage.py tests/tools/test_tool_result_storage.py
Tested: git diff --check
Some legitimate @bot pings were dropped because the mention gates relied on
message.mentions alone, which does not always populate raw <@ID> / <@!ID>
forms (mobile, edited, relayed messages). A bare @bot with no other text
could also spawn a fake empty-text turn.
- add _self_is_explicitly_mentioned() / _raw_mentioned_user_ids() helpers that
treat the bot as mentioned via resolved mentions OR raw content forms
- use them at the allow_bots=mentions gate, multi-agent bot filtering, the
mention-strip/mention_prefix step, and the require_mention gate
- drop bare mention-only pings (no text, no media, no injection, no backfill
context) instead of injecting a placeholder empty turn
Co-authored-by: Teknium <teknium1@gmail.com>
Registers PowerShell (.ps1/.psm1/.psd1) in the LSP server registry,
spawning PowerShellEditorServices over stdio via a pwsh/powershell
host. PSES ships as a GitHub release zip (no npm/go/pip recipe), so it
sits in the manual install tier alongside rust-analyzer and clangd.
The spawn builder resolves the module bundle from (in order) the
lsp.servers.powershell.command override, init bundlePath, the
PSES_BUNDLE_PATH env var, or <HERMES_HOME>/lsp/PowerShellEditorServices,
then launches Start-EditorServices.ps1 -Stdio with a non-interactive,
no-profile host. hermes lsp status/list report it as manual-only until
pwsh is present.
Docs and tests included.
When the summary LLM hits a 429/transient failure, _generate_summary() sets
a cooldown and returns None; compress() inserts a static fallback marker and
returns. Tokens stay above threshold, so should_compress() kept returning
True and every subsequent agent turn re-fired _compress_context() — the CLI
appeared frozen until the cooldown expired.
Add a cooldown guard to should_compress(): return False while
_summary_failure_cooldown_until is in the future. Reuses the existing float;
no new state. Manual /compress (force=True) still clears the cooldown first.
Fixes#11529
Generic provider:custom relays were force-routed to the OpenAI Responses
API whenever the model matched gpt-5*, and a stale persisted
model.api_mode=codex_responses survived /reset and upgrades. Some
OpenAI-compatible relays do not implement Responses semantics, which
surfaced as malformed function_call.name replay errors in gateway sessions.
- runtime_provider: route custom-provider api_mode through
_resolve_plain_custom_api_mode(), which drops a stale codex_responses
unless the URL is direct OpenAI/xAI
- run_agent: _provider_model_requires_responses_api returns False for
custom; direct api.openai.com / api.x.ai URLs still upgrade via
_is_direct_openai_url() / URL detection
- regression coverage for plain relays vs direct OpenAI/xAI URLs
Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
* fix(agent): drop tool_calls with empty function.name to prevent orphan 400
Salvage of #12807 by @melonboy312 — rebased onto current main (sanitizer
moved to agent_runtime_helpers), scoped to the sanitizer fix, with a
regression test that fails without it.
* fix(agent): repair (not drop) empty-name tool_calls to preserve anti-priming + prevent 400
Dropping empty-name tool_calls in the pre-call sanitizer collided with #47967,
which intentionally keeps an empty-name call paired with a synthesized
'tool name was empty' anti-priming result so weak models self-correct without
a full catalog dump. Dropping the call orphaned that result and stripped the
signal (breaking tests/agent/test_empty_tool_name_loop_dampening.py).
The actual HTTP 400 cause is an ORPHANED function_call_output (adapter drops
the empty-name function_call but keeps its output). Rename the blank name to a
non-empty sentinel instead: the call and its result stay paired, the adapter
no longer drops the function_call, no orphan, no 400 — and the anti-priming
result content the model needs is preserved.
---------
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
The FactRetriever's _fts_candidates passed the raw query string directly
to FTS5's MATCH operator. FTS5 defaults to AND-between-tokens, which
means any multi-word prose query like 'what happened with the deployment
rollback' required every single token to co-occur in a fact — dropping
recall to zero on the kind of queries agents actually issue via prefetch().
Fix: add _sanitize_fts_query() that:
- tokenizes the query and drops English stopwords
- strips FTS5 operator characters per token
- OR-joins the remaining content tokens as phrase literals
For pathological inputs (all stopwords, empty), falls back to the raw
query so the caller sees zero results instead of a SQL error.
This is a pure-retrieval-quality fix — the HRR + Jaccard reranking
stages still keep precision high. Ships with 10 tests covering the
sanitizer and retrieval integration.
The init snapshot dumped functions with a line-based filter:
declare -f | grep -vE '^_[^_]'
That strips a function's *header* line (e.g. `_foo () `) but leaves the
orphaned `{ ... }` body behind, corrupting the snapshot that is sourced
before every command. Sourcing the torn snapshot runs leftover body code
and breaks subsequent commands (intermittent exit 127).
- Filter private (`_`-prefixed) functions by NAME via `declare -F` and
dump only the wanted whole definitions, so a body is never torn. Guard
against an empty name list (bare `declare -f` dumps everything).
- Treat a non-zero bootstrap exit code as snapshot-init failure, so
execution safely falls back to login-shell-per-command mode.
- Add a regression test asserting snapshot_ready stays false when
bootstrap exits non-zero.
Preserves the atomic-write ($BASHPID temp + mv -f) machinery from #38249.
The polling heartbeat's pending-update probe treated a stopped updater
(running=False) as "someone else's job" and silently reset its counter,
so a long-poll task that disappears with no reconnect in flight was never
recovered. get_me() on the general request path stays healthy, so neither
PTB's error_callback nor the connectivity probe ever fires — the gateway
keeps running but stops receiving messages indefinitely (#55769).
Detect the stopped-updater case directly in _probe_pending_updates and feed
it into the existing _handle_polling_network_error ladder, debounced over two
consecutive probes so a just-starting updater or the brief stop()->start_polling()
window of an in-flight reconnect never trips it.
background_review hardcoded enabled_toolsets=["memory", "skills"] in the
review fork's whitelist, so a skill-review fork on a profile with
memory_enabled: false still granted the LLM the built-in MEMORY.md read/write
tool — contaminating a profile that opted out of built-in memory. The flag was
already in scope (review_agent._memory_enabled). Include "memory" only when
_memory_enabled or _user_profile_enabled (USER.md also needs the tool).
Layer 1 of #54937 (the path leak) is fixed by this PR's thread-context
propagation: get_memory_dir() is already per-call on main, so once the
bg-review thread inherits the profile override its writes land in the right
profile (verified). This commit closes the remaining whitelist layer.
The inbound-media validator _is_allowed_bridge_path() checked against
IMAGE_CACHE_DIR / AUDIO_CACHE_DIR / VIDEO_CACHE_DIR / DOCUMENT_CACHE_DIR
value-imported at module load. After the base.py cache-dir getters became
per-call resolvers, the bridge writes media into the active profile's cache
while the validator still matched the frozen launch-profile constants — so
media was rejected under a profile override (multi-profile gateway).
Resolve the cache roots per-call via the get_*_cache_dir() getters and drop
the now-unused frozen value-imports. Caught by automated review on #55867.
The reachability claim that single-process multi-profile leakage is desktop-
only is incomplete. gateway/run.py:_profile_runtime_scope shows a SECOND such
runtime: the multiplexed gateway (gateway.multiplex_profiles) serves every
profile from one process, scoping each inbound turn with the same
set_hermes_home_override ContextVar the desktop uses (and the /p/<profile>/
URL prefix). The M1 (import-time path globals) and M2 (thread/executor
context) leaks are reachable there identically.
- tests/gateway/test_multiplex_credential_isolation.py: add a class driving the
skills-dir + cache-dir resolvers and a propagated worker thread under the
real _profile_runtime_scope, asserting each resolves the active profile. Sits
beside the existing credential-isolation proofs for the same topology.
- Correct the inline comments in model_tools/run_agent/async_delegation/
rich_sent_store to name both runtimes (desktop tui_gateway AND the
multiplexed gateway) instead of implying desktop is the only surface.
(ACP runs one agent per subprocess and the kanban dispatcher Popens
'hermes -p <profile>' children, so neither is an in-process multi-profile
surface; desktop + multiplexed gateway are the two confirmed ones.)
- tools/skills_hub.py: the per-call resolvers now honor a test-injected real
module attribute (patch.object(hub, 'SKILLS_DIR', ...) / monkeypatch.setattr)
before falling back to dynamic profile resolution. PEP 562 __getattr__ only
fires when no real attribute exists, so an unpatched module resolves the
active profile and a patched one respects the test's value — keeping the
existing skills_hub test seam intact (5 tests had broken).
- tests/test_profile_isolation_runtime.py: real two-profile (no-mock) suite
driving each previously-leaking site under override A then B and asserting
the active profile's path/identity is used: skills_hub paths + derived
constants + default-arg resolution, gateway cache getters (incl. the
monkeypatch-still-wins seam), rich_sent_store path, and thread/executor
context propagation (raw-thread hazard documented; primitive + _run_async
worker proven to preserve the override).
Adds gateway.platform_connect_timeout (default 30s) to DEFAULT_CONFIG and
bridges it to the internal HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT env var
at gateway startup, following the existing gateway_timeout config->env
pattern. The env var remains the manual-override escape hatch and wins if
set explicitly; otherwise config.yaml supplies the value. This closes the
issue's documentation/config-surface request (#19776 suggestion 2) on top
of the adapter ready-wait fix, so users no longer need an undocumented env
var to raise the Discord connect timeout.
Refs #19776
The extension-less MEDIA delivery guards short-circuited on
"MEDIA: not in text and [[audio_as_voice]] not in text", so a
response carrying only [[as_document]] (an image-only reply requesting
unmodified document delivery) leaked the directive as visible text.
Add [[as_document]] to both guard conditions (_strip_media_tag_directives
and strip_media_directives_for_display) and cover it with a regression
test.
Follow-up to liuhao1024's #46924. Route plain-text approval replies
through the canonical /approve and /deny handlers (resolve thread, resume
typing, return localized confirmation) and deliver that confirmation back
to the user — previously a plain 'yes' resolved silently. Synthesize a
literal '/'-prefixed command so get_command_args() parses always/session
modifiers on every platform (is_command() only recognizes '/'). Add E2E
tests covering approve/deny/always/session vocab plus the no-pending and
unrelated-text fall-through cases.
The judge gate added for kanban_complete (Issue #38367, PR #38388) only
covers one of the two exit paths out of run_kanban_goal_loop(). The loop
treats status == "blocked" as terminal identically to "done" (and any
other status outside running/ready/done/blocked also stops the loop —
see goals.py's status dispatch). A goal_mode worker that has learned
kanban_complete is gated can simply call kanban_block(reason="anything")
to escape the loop with zero judge involvement, fully defeating the
intent of #38367's fix.
This is Issue #38696, filed as the explicit follow-up by a reviewer on
PR #38388: "kanban_complete is one way out; kanban_block is another...
A worker that learns the complete path is gated can shift to calling
block to escape the loop with the same effect."
Implements the issue's "Option B" (deterministic allowlist, no extra
judge LLM call) using the kind taxonomy that already exists in
kb.VALID_BLOCK_KINDS, rather than inventing a new judge_goal() outcome
type (judge_goal only returns done/continue/wait/skipped — there's no
"is this block legitimate" verdict to hook the issue's "Option A"
pseudocode onto without expanding the judge's contract).
goal_mode tasks may only block with kind in {dependency, needs_input} —
the two kinds that represent a genuine external blocker the worker
cannot resolve itself. `capability`, `transient`, and an unset kind are
rejected with a message directing the worker to kanban_complete instead,
which the judge now gates. Non-goal_mode tasks are completely unaffected.
kitty fits an image to its cell rect preserving aspect, so a frame whose pixel
size isn't a whole multiple of the cell rounds up — clipping the bottom row
("clipped feet") and letterboxing a blank row. Trim each frame to its union
alpha bbox, then snap to an exact cell multiple before transmit so the sprite
hugs its box and renders full-body. (ratatui-image#57: render in multiples of
the font-size.)
Assert a journey edit leaves MEMORY.md byte-identical to MemoryStore's
own §-join (no trailing-newline drift) and round-trips through
MemoryStore._read_file, so the two surfaces can never diverge on format.
MoA sessions could not stream: the gateway streaming toggle was a no-op for
provider "moa", so users saw nothing until the entire response finished — minutes
of silence on long turns. The aggregator's reply was always fetched whole.
Root cause was twofold:
1. conversation_loop hard-disabled streaming for provider in {"copilot-acp",
"moa"} (MoA grouped with the ACP client, whose facade isn't a stream).
2. MoAChatCompletions.create() fetched the aggregator response whole via
call_llm(), which had no streaming mode.
For provider "moa", _create_request_openai_client() returns the MoAClient facade
itself, so the existing streaming consumer already calls
MoAChatCompletions.create(stream=True). We reuse that battle-tested consumer
(text-delta delivery, tool_call reassembly, stale-stream detection, non-streaming
fallback) instead of adding a parallel streaming path.
Changes:
- call_llm() gains stream/stream_options. When streaming it returns the raw SDK
stream iterator directly, bypassing _validate_llm_response and the
temperature/max_tokens/payment fallback chain (which assume a complete
response). The caller owns reassembly and fallback.
- MoAChatCompletions.create() runs the references first (unchanged), then when
stream=True returns the aggregator's raw stream, forwarding stream_options and
the consumer's per-request read timeout. stream=False is byte-identical to
before (no stream/stream_options/timeout forwarded).
- conversation_loop streams MoA only when a display/TTS consumer is present;
quiet/subagent/health-check paths keep the complete-response path.
Tests: tests/run_agent/test_moa_streaming.py — create() stream/non-stream
branches, stream_options + timeout forwarding, call_llm raw-stream return vs
validated non-stream. Existing MoA tests unchanged (20 passed).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_build_gemini_contents emitted one contents entry per source message and
never merged adjacent same-role entries. Gemini's generateContent requires
strict user/model alternation and rejects consecutive same-role turns with
HTTP 400 ("Please ensure that multiturn requests alternate between user and
model"). A parallel tool call turns into two tool results in a row, which
become two consecutive user functionResponse contents, so every multi-tool
turn produced an unsendable history.
Fold adjacent same-role contents into one by concatenating their parts after
the per-message loop, matching the Anthropic and Bedrock converters. For a
parallel call this yields the grouped multi-functionResponse user turn Gemini
expects.
Memories are the only drillable rows, so give them the primary "clickable"
ink and demote skills (dead-ends) to the muted complement — previously the
non-openable skills wore the link-looking primary color. Flipped in both
the TUI and CLI palettes for parity.
A single 'hermes update' / 'hermes -p' could rewrite a hand-curated config.yaml
into a near-full DEFAULT_CONFIG dump (the 'you blow up my profile config on one
tweak' reports). Root cause: migrate_config() had ~16 independent save_config()
call sites, each author deciding ad hoc whether to materialise a value, and many
persisted pure schema defaults with strip_defaults=False. Defaults already merge
transparently at read time via load_config(), so writing them is pure bloat that
also shadows future default changes (see save_config's docstring).
Architectural fix (not a per-site patch): introduce a single _persist_migration()
chokepoint that enforces one invariant — a migration may persist only values that
DIFFER from the current schema default, plus explicit removals/renames of user
data; pure defaults are never written. Every migration write (all 17 sites incl.
the version-bump finalizer) now routes through it. The invariant is mechanically
correct for all cases and verified empirically:
- pure-default seeds (timezone='', curator/auxiliary.curator blocks, interim
flag, curator.consolidate=False, empty plugins.enabled) are stripped → merged
in at read time;
- non-default values (write_approval=True, model_catalog.ttl_hours=1) preserved
via explicit-raw-path preservation;
- behaviour flips (agent.verify_on_stop=False, schema default still 'auto')
preserved because False != 'auto';
- data transforms (custom_providers->providers, stt.model relocation,
write_mode->write_approval, compression.summary_* removal, MCP-disable)
persist their removals/renames.
An explicitly user-set non-default value (e.g. matrix.require_mention: false) is
preserved across the bump.
Guard tests lock the architecture: an AST check asserts migrate_config() makes no
direct save_config() call (all writes go through _persist_migration), and a
full-range v1->latest test asserts a lean config is never dumped. Two existing
change-detector tests that froze the on-disk representation of default-valued
keys are rewritten to assert the effective value via load_config() (behaviour
contract, not snapshot).
Validation: lean v1->latest migration drops from ~567 bytes to ~196 bytes;
148 config+setup and 196 profile/curator/migrate tests pass on scripts/run_tests.sh.
Builds on the zero-match feedback fix (previous commit) to close the silent-hang
symptom: when memory is at capacity, a failed `add`/`replace`/`remove`
consolidation could loop the whole turn to iteration-budget exhaustion and
deliver no user-facing reply.
#41755 turned the at-capacity overflow error into a *commanded* in-turn retry
("...then retry this add — all in this turn"); combined with the fragile
substring-only `replace`/`remove` matching (LLMs can't reliably re-quote a long
entry verbatim), the model loops add↔replace on inexact guesses until the turn
dies. The existing tool_guardrails halt would catch this, but hard_stop_enabled
is opt-in (off by default), so a default install still hangs.
This fixes it at the memory layer without changing global guardrail behavior:
- MemoryStore tracks per-turn consolidation failures; after a cap (3) it drops
the "retry in this turn" instruction and returns a terminal "leave memory
unchanged, continue your reply" result, so a failed memory side effect can
never block the turn's reply.
- The counter resets on any successful write (progress) and at each turn
boundary (turn_context.reset_consolidation_failures, guarded via getattr so
plugin memory stores without the method are a no-op).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Regression for #54820: a desktop-only helper with a failing check_fn must
not mark the whole terminal toolset unavailable when terminal/process
still pass their per-tool gates.
Interrupting the agent while an approval/clarify/sudo/secret prompt is up
left the overlay state dict set with no thread servicing it. The prompt's
worker thread is torn down on interrupt, but read_only (gated on
_command_running) plus the keypress filter kept the CLI input locked until
the prompt's own timeout expired — the terminal appeared frozen.
Drain and clear all four input-blocking overlays on interrupt via a single
helper (_clear_active_overlays_for_interrupt): approval -> deny,
clarify/sudo/secret -> cancel, each guarded so a dead queue can't block the
others; sudo restores the pre-modal draft. Wired into all three interrupt
paths — new-message interrupt, Ctrl+C, and Ctrl+Q. Blocking overlays now
clear AND fall through so one keypress both clears a stale overlay and
interrupts a still-running agent; the /model picker and slash-confirm
foreground prompts keep their cancel-and-return behavior.
Closes#13618.
_resolve_task_provider_model() flattened any explicit base_url to
provider=custom. Correct for bare/custom endpoints, but wrong for
provider-backed routes (anthropic, qwen-oauth, minimax-oauth,
openai-codex, etc.) whose provider branch adds auth refresh, transport,
or request shaping. MoA reference slots resolved through those providers
lost their identity before the aux call, so e.g. a Codex reference hit
chatgpt.com/backend-api/codex without its Cloudflare headers and got
HTML back (surfacing as a spurious rate-limit).
Keep first-class providers intact when paired with a resolved base_url
via _preserve_provider_with_base_url(); bare/custom/auto/unknown and the
direct openai alias still route through custom.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
test_background_task_registers_thread_local_approval_callbacks polled a
2s wall-clock deadline waiting for the background daemon thread to pop
its entry from _background_tasks. Under loaded CI the thread's
finally-block cleanup could lag the deadline, flaking the final
'assert not cli._background_tasks'. Join the actual worker thread
(timeout=10) so the wait ends exactly when the thread finishes.
The STT-failure enrichment templates injected setup instructions —
"no STT provider is configured", "a direct message has already been
sent", and a "hermes-agent-setup" skill mention — into the LLM-visible
prompt. That text persists in conversation history, so after one STT
failure the model kept volunteering Whisper/Vosk setup advice on every
later voice turn, even after transcription started working (observed in
prod on gpt-5-nano). The gateway also fired a hardcoded English notice
via _stt_adapter.send(), producing a second, wrong-language reply that
TTS then spoke aloud.
- Neutralize all enrichment templates: success passes the transcript
through as a plain quoted line; every failure branch emits a single
[voice message could not be transcribed] marker.
- Move the operator-facing failure cause to logger.info so it stays
diagnosable in container logs without leaking into the prompt.
- Remove the hardcoded English _stt_adapter.send() notice; the LLM now
produces one coherent reply in the user's language.
- Update the gateway STT tests to assert the neutral contract.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
* fix(agent): merge consecutive assistant messages in repair_message_sequence
Strict OpenAI-compatible providers (DeepSeek v4, Moonshot/Kimi) reject a
replayed history where an assistant message carrying tool_calls is
immediately followed by another assistant message instead of its tool
results — HTTP 400 'An assistant message with tool_calls must be
followed by tool messages...'.
repair_message_sequence (the defensive belt run before every API call)
fixed orphan-tool and consecutive-user shapes but never merged
consecutive assistant messages. Adds a Pass 0 that collapses adjacent
assistant turns into one — union of tool_calls, concatenated content,
carried reasoning_content — covering both reported shapes:
- parallel tool calls split across two assistant turns (#29148)
- content-only assistant followed by tool_calls-only assistant (#49147)
A tool result or user turn between two assistants blocks the merge
(distinct, valid rounds). Runs before Pass 1 so the merged union of
tool_call ids is known to the orphan-tool filter.
Closes#29148, #49147.
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
Co-authored-by: woaini30050 <woaini30050@users.noreply.github.com>
Co-authored-by: weidzhou <weidzhou@users.noreply.github.com>
* fix(agent): exempt codex Responses interim turns from assistant merge
The Pass 0 consecutive-assistant merge collapsed codex_responses interim
turns, which legitimately stay separate — each carries its own encrypted
continuation state (codex_reasoning_items / codex_message_items) that
must replay verbatim. Skip the merge when either side is a codex interim
(has codex_reasoning_items / codex_message_items / finish_reason=='incomplete').
Fixes the slice-2 regression in test_run_agent_codex_responses.py
(test_duplicate_detection_distinguishes_different_codex_{reasoning,message_items}).
---------
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
Co-authored-by: woaini30050 <woaini30050@users.noreply.github.com>
Co-authored-by: weidzhou <weidzhou@users.noreply.github.com>
The concurrent-compression regression asserted the parent ends with exactly
one child. Under heavy CI write contention the lock winner's child
create_session can exhaust its SQLite retry budget, and _compress_context
deliberately rolls the live id back to the still-indexed parent rather than
orphaning a child (the create-failure rollback in
agent/conversation_compression.py). That safe rollback leaves zero children
and is correct — so the exact == 1 assertion flaked under load.
Assert the actual invariant instead: children <= 1 (a 2+ fork is the bug
Damien's incident is about), rotated <= 1, and rotated == n_children. A
mutation check (force the lock to always acquire) confirms the relaxed
assertion still fails hard on a real 2-child fork.
Two independent bugs evicted the cached gateway AIAgent on every turn,
preventing the prompt cache from ever warming:
1. Model normalization mismatch: the post-run fallback-eviction check
compared _agent.model (stripped in AIAgent.__init__) against the raw
_resolve_gateway_model() config string. For vendor-prefixed config on
native providers (e.g. 'deepseek/deepseek-v4-pro' vs 'deepseek-v4-pro')
this was always unequal, so the agent was evicted after every
successful run. Normalize _cfg_model the same way (skip aggregators).
2. Discord triggering message_id leaked into the cached system prompt via
build_session_context_prompt()'s Discord IDs block. message_id changes
every turn, so the agent-cache signature (computed from the ephemeral
prompt) changed every Discord turn -> rebuild every message. The id is
now injected per-turn into the user message (where per-turn content
belongs and does not touch the cache signature); the cached IDs block
carries a static pointer to it, preserving reply/react/pin via the
discord tools.
Adapted from #28846. Bug #1 fix is the contributor's; bug #2 reworked to
be non-destructive (keeps the triggering-id capability instead of deleting
it). Redundant auto-reset eviction (already on main via #9893/#48031) and
the wrong-premise reset_context_note plumbing from the original PR were
dropped.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
exact_moa_preset_name matched any bare model name equal to a preset key,
regardless of the preset's enabled flag. On the no-explicit-provider switch
path (PATH B in model_switch.py), a plain /model switch whose name collided
with a preset key (e.g. "default") silently pivoted the session onto the MoA
virtual provider — even when the user had set enabled: false to opt out
(issue #55187). The LLM driving a routine model switch could land on a broken
moa provider with empty default_preset / unconfigured aggregator credentials.
Gate the implicit bare-name match on the per-preset enabled flag. Explicit
selection via --provider moa / the model picker uses PATH A and does not go
through exact_moa_preset_name, so a disabled preset stays reachable when the
user explicitly asks for it.
Builds on memosr's sink-level opt-in gate (#29249). Enabling a
non-bundled plugin now surfaces the privileged allow_tool_override
decision at `hermes plugins enable` time instead of leaving the
operator to discover the config key after a runtime rejection.
- `hermes plugins enable <name>` prompts for non-bundled plugins:
'Allow this plugin to replace built-in tools?' Default is deny
(blank Enter / non-interactive stdin / EOF all fail closed).
- --allow-tool-override / --no-allow-tool-override flags for
non-interactive and scripted use (and a future desktop checkbox).
- Bundled plugins are trusted: never prompted, no entry written.
- Writes plugins.entries.<key>.allow_tool_override, the same key the
sink gate reads (manifest.key == discovery key), so consent and
enforcement compose end to end.
egilewski found the prior sink gate was transient: it only applied while
PluginManager executed register(ctx). A plugin could defer a direct
registry.register(..., override=True) to a post-load callback/thread, after
the scope was cleared, and still replace a built-in.
Make authorization durable by binding it to where the handler is DEFINED
(handler.__globals__['__name__']) rather than to call timing. At load, each
plugin's module namespace is mapped to its allow_tool_override opt-in in a
table that is never cleared. The sink resolves the handler's owning plugin
module and rejects an override from any plugin namespace without opt-in,
regardless of when or on which thread the call happens. Plugin namespaces
with no recorded policy are treated as not-opted-in (fail-closed). Built-in
and MCP handlers live outside the plugin namespace and are unaffected.
Adds a regression test for the delayed/post-load direct-registry override.
The opt-in gate lived only in PluginContext.register_tool, so a plugin
could bypass it by importing tools.registry and calling
registry.register(..., override=True) directly. Enforce the same gate at
the sink: during plugin load, the registry rejects an override from a
plugin without operator opt-in regardless of the path taken. Built-in and
MCP registrations (no active plugin scope) are unaffected.
Adds a regression test covering the direct-registry bypass.