When a model emits an inline <think>...</think> block but the opening
tag is dropped upstream (thinking-mode toggle, truncated stream, or
incomplete upstream filtering), the bare </think> close tag leaked
through to the user in the live progressive edit. The agent-side final
scrubber (agent/think_scrubber.py) already had _strip_orphan_close_tags;
this ports the same logic into GatewayStreamConsumer so the streaming
display stays clean too.
- _filter_and_accumulate: strip orphan close tags before appending the
'no-opening-tag' branch text to _accumulated.
- _flush_think_buffer: same on stream end for held-back partials.
- 14 regression tests (TestStripOrphanCloseTags): all 6 close-tag
variants, multi-tag, partial-tag-untouched, trailing whitespace,
and end-to-end through _filter_and_accumulate / _flush_think_buffer.
Only strips KNOWN close-tag names (case-insensitive) — never arbitrary
tag-shaped substrings — so comparison operators and unrelated prose are
preserved.
Salvaged from PR #43192 by @testingbuddies24.
scripts/release.py AUTHOR_MAP is greped by the Contributor Attribution
Check to resolve a commit author's email -> GitHub username. Add
huangsen365@gmail.com -> huangsen365 so this PR's commits pass the check.
(This commit originally also carried a gateway race-test flake fix; that
edit is now dropped because main independently hardened the same test with
a superior server._sessions snapshot/restore isolation, making ours
redundant.)
`@file` / `@folder` context-reference expansion enforced its own narrow
deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`)
that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`,
and `skills/.hub`. It never blocked the credential stores that the canonical
read guard (`agent/file_safety.get_read_block_error`) protects: provider API
keys (`~/.hermes/auth.json`), Anthropic OAuth tokens
(`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`),
webhook HMAC secrets, and project-local `.env` files.
This matters because the messaging gateway feeds **untrusted** remote text
straight into reference expansion: `gateway/run.py` calls
`preprocess_context_references_async(..., allowed_root=_msg_cwd)` where
`_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat
peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass
the `allowed_root` check (it resolves under HOME), slip past the narrow list,
and have the operator's live keys read into the agent's context — where the
model would typically echo or act on them.
Rather than duplicate and re-sync a second secret list, this routes the guard
through the existing single source of truth. A reviewer might ask "why not just
add `auth.json` to the local list?" — because the local list has already drifted
once (a prior commit had to add `.config/gh`); anchoring to
`get_read_block_error` means every future addition there protects this path too.
The narrow checks are kept as a fallback since they also cover dirs that guard
does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped
so it can never crash reference expansion.
N/A
- [x] 🔒 Security fix
- `agent/context_references.py`: `_ensure_reference_path_allowed` now also
consults `agent.file_safety.get_read_block_error` after its existing checks
and refuses the reference when that canonical guard flags the resolved path.
The lookup is wrapped so guard-resolution failures fall back to the explicit
checks instead of breaking expansion.
- `tests/agent/test_context_references.py`: added
`test_blocks_canonical_read_denylist_credential_stores`, asserting that
`@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/*`, and
a project-local `.env` are all refused and their secret bodies never reach the
expanded message.
- `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release
gate).
1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests
pass, including the new credential-store case.
2. Regression proof: stash `agent/context_references.py`, run the suite with
`-- -k canonical`, and confirm the new test fails (secrets leak into the
message) without the fix; restore and confirm it passes.
3. `ruff check agent/context_references.py tests/agent/test_context_references.py`
and `python scripts/check-windows-footguns.py agent/context_references.py
tests/agent/test_context_references.py` both pass.
- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (plus the AUTHOR_MAP release gate)
- [x] I've run the test suite for the touched area and all tests pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
models_dev.py's fetch uses a synchronous requests.get(timeout=15). Called
from the async gateway message handlers, it blocked the event loop for up
to 15s, starving Discord heartbeats and causing ClientConnectionResetError
disconnects.
Adds get_model_context_length_async() which offloads the entire sync
resolution chain to a worker thread via asyncio.to_thread(), and switches
the two async gateway call sites (_prepare_inbound_message_text,
_handle_message_with_agent) to await it. The loop stays responsive; the
sync path remains the single source of truth for the cache.
Salvaged from PR #22753 by @itenev. Follow-up: dropped the unused
fetch_models_dev_async/lookup_models_dev_context_async aiohttp variants
from the original PR (dead code with zero callers that had drifted from
the sync cache logic) — the to_thread wrapper already runs the sync path
off-loop, so they were redundant.
Path(raw).name reduces '..'/'.'/'' to themselves, so basename
extraction alone still let a Graph-provided display_name of '..' or
'../' escape the temp recording directory (tmp_dir / '..' resolves to
the parent). Reject the dot-only basenames explicitly and fall back to
the artifact id. Extends @outsourc-e's regression coverage with the
dot-only cases.
When the primary provider's auth fails (expired token / 429 quota cap),
_resolve_runtime_agent_kwargs() falls through to the fallback provider
chain, whose runtime dict carries its own 'model' key. api_server's
_create_agent then did AIAgent(model=model, **runtime_kwargs), colliding
on 'model' and 500ing every /v1/chat/completions request while a fallback
was active. Pop the runtime model and let it override the config model,
mirroring the native gateway path (_resolve_session_agent_runtime).
Salvaged from #35716 by @ryo-solo (earliest submitter); the PR's second
half (Mistral reasoning_content strip) is already handled on main and
dropped.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Ephemeral empty-response/prefill recovery scaffolding (the synthetic
assistant "(empty)" turn, the user nudge, the terminal "(empty)"
sentinel, and the thinking-only prefill placeholder) exists only to
drive the next API retry; the in-memory loop pops it before appending
the real response. The append-only flush did not mirror that, so a
mid-turn persist could commit scaffolding to the SQLite session store
(and JSON log), and a resumed session would replay synthetic
"(empty)"/nudge turns as genuine context — re-poisoning the empty-retry
boundary forever.
Filter ephemeral scaffolding at both durable-write sites
(_flush_messages_to_session_db + _save_session_log), by flag not
position, so buried scaffolding (an answered nudge leaves the synthetic
pair mid-list) is skipped too. Covers all three flags including
_thinking_prefill.
Adapted onto current main's identity-tracking flush.
Cherry-picked from #41281 by petrichor-op.
@janrenz's PR #35862 added prompt_caching.enabled=false at init only. But
_anthropic_prompt_cache_policy re-derives _use_prompt_caching on every /model
switch (agent_runtime_helpers) and fallback-model swap (chat_completion_helpers),
which re-enabled markers and re-broke the strict proxy the toggle was meant to fix.
Move the kill switch into anthropic_prompt_cache_policy so it returns (False, False)
on every path. Drop the now-redundant init-time override (kept @janrenz's isinstance
hardening on the cache_ttl read). Add policy-level tests + docs for the toggle.
Follow-up to salvaged PR #35862.
Mitigates indirect prompt injection (CWE-863) in Slack thread context.
When the bot is mentioned mid-thread for the first time, _fetch_thread_context
pulls the full thread via conversations.replies and prepends every reply to
the LLM prompt. Replies from senders not on the allowlist were rendered
identically to authorised senders, letting a third party in a shared channel
inject instructions the model might act on when answering the next authorised
message.
- BasePlatformAdapter.set_authorization_check / _is_sender_authorized, registered
by GatewayRunner._make_adapter_auth_check() with a closure over the existing
_is_user_authorized chain (platform/global/group allowlists, allow-all flags,
pairing store all stay the single source of truth — no env-var re-parsing).
- Tags non-bot thread messages whose sender fails the auth check with an
[unverified] prefix; strengthens the header with soft guidance only when at
least one unverified message is present, so setups without an allowlist see
no behaviour change.
- Wired into all three adapter-init sites in run.py (start, reconnect watcher,
restart) so the reconnect path is covered too.
Softened wording: adapted from the original [untrusted] tag to [unverified]
and non-accusatory header framing — the label reflects allowlist status, not
a judgment about the person. Adapter relocated to plugins/platforms/slack/
since the PR was authored.
Salvaged from #17059.
/queue rebuilt the queued MessageEvent with only text/type/source/
message_id/channel_prompt, silently dropping any photo, document, voice,
or reply context attached to the command. The deferred turn then ran with
the attachment lost. Carry the full payload through, and accept a /queue
that has media but no prompt text (e.g. "/queue" as an image caption).
Salvaged from #13913 by @ypwcharles — the gateway busy-session/queue
infrastructure was rewritten since that PR (Telegram moved to
plugins/platforms/, /queue now uses the FIFO chain), so the media fix is
reimplemented against the current handler; the PR's batching and
busy-bypass changes targeted code paths that no longer exist.
Co-authored-by: ypwcharles <92324143+ypwcharles@users.noreply.github.com>
The dangerous-command approval prompt renders the flagged command so the
user can decide whether to approve. If the agent constructed it with a
credential (curl -H 'Authorization: Bearer sk-...', psql postgres://user:pw@host,
an execute_code script with api_key = 'sk-...'), that secret hit stdout and,
via the gateway notify payload, Discord/Slack messages — which are
screenshottable and forwardable.
Apply the existing agent.redact.redact_sensitive_text() to every user-facing
approval surface. Redaction is display-only: the raw command still executes
after approval, and approval persistence keys off pattern_key (not the command
text), so the allowlist is unaffected. Decision context (URL, flags, command
structure) is preserved; only the secret value masks.
Covers all surfaces, including the execute_code path the original PR missed:
- prompt_dangerous_approval(): callback + stdout fallback
- check_all_command_guards(): gateway approval_data + cron/batch pending fallback
- check_execute_code_guard(): gateway approval_data + no-notifier pending fallback
(script body can embed credentials)
Adds TestApprovalPromptRedaction covering callback redaction, no-over-redaction
of clean commands, and the execute_code pending fallback.
Salvaged from PR #13139 by @sgabel; extended to the execute_code surface.
The store-level search_facts() shared the same raw-MATCH bug class as
_fts_candidates (FTS5 AND-joins tokens, zeroing prose recall). Route it
through FactRetriever._sanitize_fts_query via a lazy import to keep the
store->retrieval layering acyclic. Also add cyb3rwr3n to release AUTHOR_MAP.
Interrupting the agent while an approval/clarify/sudo/secret prompt is up
left the overlay state dict set with no thread servicing it. The prompt's
worker thread is torn down on interrupt, but read_only (gated on
_command_running) plus the keypress filter kept the CLI input locked until
the prompt's own timeout expired — the terminal appeared frozen.
Drain and clear all four input-blocking overlays on interrupt via a single
helper (_clear_active_overlays_for_interrupt): approval -> deny,
clarify/sudo/secret -> cancel, each guarded so a dead queue can't block the
others; sudo restores the pre-modal draft. Wired into all three interrupt
paths — new-message interrupt, Ctrl+C, and Ctrl+Q. Blocking overlays now
clear AND fall through so one keypress both clears a stale overlay and
interrupts a still-running agent; the /model picker and slash-confirm
foreground prompts keep their cancel-and-return behavior.
Closes#13618.
Two independent bugs evicted the cached gateway AIAgent on every turn,
preventing the prompt cache from ever warming:
1. Model normalization mismatch: the post-run fallback-eviction check
compared _agent.model (stripped in AIAgent.__init__) against the raw
_resolve_gateway_model() config string. For vendor-prefixed config on
native providers (e.g. 'deepseek/deepseek-v4-pro' vs 'deepseek-v4-pro')
this was always unequal, so the agent was evicted after every
successful run. Normalize _cfg_model the same way (skip aggregators).
2. Discord triggering message_id leaked into the cached system prompt via
build_session_context_prompt()'s Discord IDs block. message_id changes
every turn, so the agent-cache signature (computed from the ephemeral
prompt) changed every Discord turn -> rebuild every message. The id is
now injected per-turn into the user message (where per-turn content
belongs and does not touch the cache signature); the cached IDs block
carries a static pointer to it, preserving reply/react/pin via the
discord tools.
Adapted from #28846. Bug #1 fix is the contributor's; bug #2 reworked to
be non-destructive (keeps the triggering-id capability instead of deleting
it). Redundant auto-reset eviction (already on main via #9893/#48031) and
the wrong-premise reset_context_note plumbing from the original PR were
dropped.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
OpenRouter returns 429 in two shapes: an account-level throttle on the
user's key, and an upstream-provider throttle (DeepSeek/Anthropic/etc.
rate-limiting OpenRouter's aggregate traffic). The classifier treated
both identically and rotated/exhausted OPENROUTER_API_KEY on every 429 —
burning the key for ~24min and silently disabling auxiliary features
(compression, summarization, vision) on an upstream throttle where the
key was healthy.
Add a FailoverReason.upstream_rate_limit classified from OpenRouter's
unambiguous wrapper message "Provider returned error" (the same signal
the metadata-raw parser already trusts). Recovery skips credential
rotation and defers to the fallback chain to switch models instead.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>