hermes-agent

Author	SHA1	Message	Date
Ben Barclay	c71f816956	fix(compression): clear all per-session state in on_session_end, not just _previous_summary The original cross-session contamination fix (#38788) only cleared _previous_summary in on_session_end(), but on_session_reset() clears 14+ per-session variables. When a session ends (cron exit, gateway expiry, session-id rotation) and the compressor instance is reused, the surviving stale state causes: - _ineffective_compression_count surviving → next session skips compression prematurely (anti-thrashing guard misfires) - _summary_failure_cooldown_until surviving → next session blocks summary generation for an unrelated transient error - _last_compress_aborted surviving → callers think compression is still aborted - _last_aux_model_failure_* surviving → stale error warnings shown - _last_summary_dropped_count / _last_summary_fallback_used surviving → misleading user warnings - _context_probed / _context_probe_persistable surviving → stale context-probe state Also fix on_session_reset() which was missing _last_compress_aborted clearing — a /new or /reset would inherit the aborted flag from the prior conversation. Add 6 targeted tests covering the leak vectors and a parity test ensuring on_session_end and on_session_reset always clear the same surface.	2026-07-01 02:48:32 -07:00
ArthurZhang	fdb9620ac4	security(agent): redact Slack App-Level (xapp-) tokens The xapp-<num>-<hash> format used by Slack App-Level / Socket Mode tokens was missing from both agent/redact.py prefix patterns and gateway/run.py gateway secret patterns, so SLACK_APP_TOKEN values could leak through to chat users even with security.redact_secrets enabled. Adds an anchored xapp-\d+- pattern to both redaction paths.	2026-07-01 02:45:22 -07:00
Teknium	da6d5fcd13	fix(auth): serialize Codex OAuth pool refresh under the auth-store lock (#56233 ) The credential-pool Codex refresh path synced tokens from auth.json and then POSTed the refresh_token to OpenAI's token endpoint without holding the cross-process auth-store lock across the whole read->POST->write-back sequence. Because Codex refresh tokens are single-use, two concurrent Hermes processes could both adopt the same on-disk token and both POST it; the loser got refresh_token_reused / invalid_grant. Wrap the Codex OAuth branch of _refresh_entry in the existing shared _auth_store_lock (reentrant, cross-process flock) using the same extended-timeout pattern resolve_codex_runtime_credentials() already uses. A waiting process now blocks on the lock and, once inside, the in-lock re-sync picks up the rotated token the winner persisted and skips its own POST. Also send User-Agent: hermes-cli/<version> on the refresh request. Credit @cooper-oai (#34820) for identifying the concurrent-refresh reuse race; this ships the narrow lock-serialization fix without the separate Codex auth-store partition.	2026-07-01 02:45:07 -07:00
sprmn24	88d6e833f1	fix(agent): wrap list-type untrusted content in untrusted_tool_result _maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a high-risk tool (web_extract, browser_*) returns a multimodal content list ([{type:text},{type:image_url}]) — which _tool_result_content_for _active_model() produces by unwrapping the _multimodal envelope for vision-capable providers — the text part reached the model completely unguarded. An attacker page that ships one image bypassed the entire untrusted-data wrapper. Extend the wrapper to handle list content: each {type:text} part is run through the same string-wrapping path (min-char threshold, delimiter neutralization, one well-formed block), image/video parts pass through untouched so the list stays valid for vision adapters. Recursing into the existing string branch means the list path inherits the delimiter defang and the no-forgeable-fast-path hardening from #56172 for free. The outer list is rebuilt (not returned by identity), so callers compare by value.	2026-07-01 02:44:09 -07:00
mrparker0980	10a54ccc2c	fix(security): anchor @file context refs to canonical read deny-list `@file` / `@folder` context-reference expansion enforced its own narrow deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`) that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`, and `skills/.hub`. It never blocked the credential stores that the canonical read guard (`agent/file_safety.get_read_block_error`) protects: provider API keys (`~/.hermes/auth.json`), Anthropic OAuth tokens (`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`), webhook HMAC secrets, and project-local `.env` files. This matters because the messaging gateway feeds untrusted remote text straight into reference expansion: `gateway/run.py` calls `preprocess_context_references_async(..., allowed_root=_msg_cwd)` where `_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass the `allowed_root` check (it resolves under HOME), slip past the narrow list, and have the operator's live keys read into the agent's context — where the model would typically echo or act on them. Rather than duplicate and re-sync a second secret list, this routes the guard through the existing single source of truth. A reviewer might ask "why not just add `auth.json` to the local list?" — because the local list has already drifted once (a prior commit had to add `.config/gh`); anchoring to `get_read_block_error` means every future addition there protects this path too. The narrow checks are kept as a fallback since they also cover dirs that guard does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped so it can never crash reference expansion. N/A - [x] 🔒 Security fix - `agent/context_references.py`: `_ensure_reference_path_allowed` now also consults `agent.file_safety.get_read_block_error` after its existing checks and refuses the reference when that canonical guard flags the resolved path. The lookup is wrapped so guard-resolution failures fall back to the explicit checks instead of breaking expansion. - `tests/agent/test_context_references.py`: added `test_blocks_canonical_read_denylist_credential_stores`, asserting that `@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/`, and a project-local `.env` are all refused and their secret bodies never reach the expanded message. - `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release gate). 1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests pass, including the new credential-store case. 2. Regression proof: stash `agent/context_references.py`, run the suite with `-- -k canonical`, and confirm the new test fails (secrets leak into the message) without the fix; restore and confirm it passes. 3. `ruff check agent/context_references.py tests/agent/test_context_references.py` and `python scripts/check-windows-footguns.py agent/context_references.py tests/agent/test_context_references.py` both pass. - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.) - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only* changes related to this fix (plus the AUTHOR_MAP release gate) - [x] I've run the test suite for the touched area and all tests pass - [x] I've added tests for my changes (required for bug fixes) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A	2026-07-01 02:43:49 -07:00
kshitijk4poor	22a137ed40	fix(agent): prefer late-completing real result over timeout message (review) Review follow-up on the concurrent-tool deadline salvage. timed_out_indices is snapshotted from not_done at the deadline; a worker can still finish and write results[i] in the window before the post-execution result loop reads it. The loop unconditionally replaced results[i] with a fabricated 'timed out' message for any snapshotted index, discarding a genuinely-successful (just-late) result. Gate the timeout message on 'and r is None' so a real result always wins. Add a regression test that forces the snapshot-vs-result-loop race deterministically (mutation-checked: reverting the guard fails it). Also document the intentional detached-worker leak at the executor abandon site.	2026-07-01 14:56:52 +05:30
Gustavo Mendes	c1784e9093	fix(agent): bound concurrent tool execution with a wall-clock deadline A tool with no internal interrupt check (read_file, web_search, or a wedged terminal backend) that never returns keeps the concurrent-tool poll loop alive forever: the loop only breaks when all futures finish or an interrupt is requested, and the 30s heartbeat resets the gateway idle monitor so idle-kill never fires. The ThreadPoolExecutor was also used as a context manager, so its __exit__ joined the hung worker with wait=True. Add a wall-clock batch deadline (HERMES_CONCURRENT_TOOL_TIMEOUT_S, default 420s — above the 360s web_extract timeout; 0/negative disables). When it fires: cancel pending futures, signal an interrupt to the worker threads, abandon the executor (shutdown wait=False, cancel_futures=True) so hung threads aren't joined, and return a per-tool 'timed out' result for the unfinished calls while still surfacing the finished ones. Also fixes the latent futures.index(f) lookup (ambiguous with duplicate futures) by tracking a future->index map. Salvaged from #54562. Co-authored-by: Gustavo Mendes <87918773+gustavosmendes@users.noreply.github.com>	2026-07-01 14:56:52 +05:30
Teknium	913e661a09	fix(cache): stop verification-loop synthetic nudges from persisting (#56194 ) verify_on_stop / pre_verify append a synthetic assistant "done" plus a synthetic user nudge to keep the agent going one more turn before it can claim completion. Both were flagged (_verification_stop_synthetic on the nudge only), but the flags were never registered in _EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding() filter that guards both persistence sinks (SQLite flush + JSON snapshot) let them through. The resumed transcript then inherited loop-only scaffolding, invalidating the prompt-prefix cache on later turns. - add _verification_stop_synthetic and _pre_verify_synthetic to _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use) - flag the blocked attempt assistant message too, not just the nudge, so the whole synthetic pair drops together and persistence does not keep a premature done with the nudge stripped (assistant to assistant adjacency) The API-payload leak claimed in the report is already handled: the chat_completions transport strips every underscore-prefixed message key before the wire, so the marker never reaches strict providers. Reported by patppham.	2026-07-01 02:26:06 -07:00
Teknium	18c61bb8cf	fix(provider): match api.anthropic.com host on fallback api_mode detection Widen the salvaged #32243 fix to the try_activate_fallback path: a custom provider pointed at the native api.anthropic.com host (no /anthropic path suffix, name != anthropic) fell through to chat_completions -> POST /v1/chat/completions -> 404. Match the host the same way determine_api_mode() and _detect_api_mode_for_url() now do. Absorbs #49247.	2026-07-01 02:18:56 -07:00
itenev	f981d47cb0	fix(gateway): prevent Discord disconnects from blocking event loop models_dev.py's fetch uses a synchronous requests.get(timeout=15). Called from the async gateway message handlers, it blocked the event loop for up to 15s, starving Discord heartbeats and causing ClientConnectionResetError disconnects. Adds get_model_context_length_async() which offloads the entire sync resolution chain to a worker thread via asyncio.to_thread(), and switches the two async gateway call sites (_prepare_inbound_message_text, _handle_message_with_agent) to await it. The loop stays responsive; the sync path remains the single source of truth for the cache. Salvaged from PR #22753 by @itenev. Follow-up: dropped the unused fetch_models_dev_async/lookup_models_dev_context_async aiohttp variants from the original PR (dead code with zero callers that had drifted from the sync cache logic) — the to_thread wrapper already runs the sync path off-loop, so they were redundant.	2026-07-01 02:17:35 -07:00
kshitijk4poor	a658f3b28b	fix(security): strip dynamic Hermes secrets from all subprocess spawn env Subprocesses spawned by the terminal tool, execute_code, Docker backend, and the codex app-server could inherit Hermes-internal secrets that the name-based `_HERMES_PROVIDER_ENV_BLOCKLIST` can't enumerate, because they're injected into `os.environ` at runtime under dynamic names: - `AUXILIARY_<TASK>_API_KEY` / `AUXILIARY_<TASK>_BASE_URL` — per-task side-LLM credentials bridged from `config.yaml[auxiliary]` by gateway/run.py and cli.py (vision, web_extract, approval, compression, plugin-registered tasks). Often separate, higher-spend keys plus base URLs pointing at private endpoints. - `GATEWAY_RELAY__SECRET` / `_KEY` / `_TOKEN` — relay-auth material provisioned by gateway/relay. Additionally, agent/transports/codex_app_server.py built its spawn env from a raw `os.environ.copy()`, bypassing the centralized `hermes_subprocess_env()` helper entirely — handing every codex subprocess the full Tier-1 secret set (GH_TOKEN, gateway bot tokens, Modal/Daytona infra tokens, dashboard session token) unfiltered. This is the #29157 sibling spawn-site gap; copilot_acp_client already routes through the helper. Fix — single chokepoint: - Add `_is_hermes_internal_secret(key)` in tools/environments/local.py as the single source of truth for the dynamic secret patterns. Matches AUXILIARY__API_KEY / _BASE_URL and GATEWAY_RELAY__SECRET/_KEY/_TOKEN; leaves non-secret AUXILIARY__PROVIDER/_MODEL and GATEWAY_RELAY routing hints visible. - Wire the predicate into every spawn path unconditionally (ignores skill env_passthrough opt-in AND inherit_credentials — a model-driving CLI never needs these): `_sanitize_subprocess_env` (both loops), `_make_run_env` (foreground), `hermes_subprocess_env` (Tier-1), and the Docker forward filter. - Add the static GATEWAY_RELAY_* names to `_HERMES_PROVIDER_ENV_BLOCKLIST` so the exact-match path catches them independently of the predicate. - Add the GATEWAY_RELAY_ID/_SECRET/_DELIVERY_KEY triplet to `_ALWAYS_STRIP_KEYS` (Tier-1) so it is stripped unconditionally on EVERY spawn surface — including the codex/copilot `inherit_credentials=True` path that skips the Tier-2 blocklist. `_SECRET`/`_DELIVERY_KEY` are already predicate-matched; `_ID` has no secret suffix, so enumerating it here is what closes its leak on the inherit path (self-review W1). - Defense in depth: env_passthrough.py `_is_hermes_provider_credential()` now consults the same predicate, so a skill can't register these names as passthrough and tunnel them into an execute_code / terminal child. - Route codex_app_server through `hermes_subprocess_env(inherit_credentials=True)` — strips Tier-1 + dynamic-internal secrets while provider creds (which codex needs to authenticate) still flow. Consolidates PRs #53715 (necoweb3 — the _is_hermes_internal_secret backbone + Docker filter), #53503 (srojk34 — env_passthrough guard), and #55709 (srojk34 — codex routing). Retires #52348 (claudlos): its copilot half is already on main, and its codex half used the full-strip `_sanitize_subprocess_env` which would break codex provider auth — the correct tier is `inherit_credentials=True`. Tests: TestHermesInternalDynamicSecrets (terminal + predicate + passthrough override), TestInternalDynamicSecrets (hermes_subprocess_env both tiers), TestSpawnEnvSecretStripping (codex spawn env), plus env_passthrough defense-in-depth cases. Co-authored-by: necoweb3 <sswdarius@gmail.com> Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com> Co-authored-by: claudlos <claudlos@agentmail.to>	2026-07-01 14:37:22 +05:30
Omar Baradei	053424c486	fix(agent): preserve final_response on failure returns AIAgent.run_conversation() promises a dict with final_response, but 16 terminal-failure branches returned dicts that either omitted the key or set it to None. Callers that index result['final_response'] directly (run_agent.py chat() + the __main__ printer) turn a real provider/context failure into an opaque KeyError instead of surfacing the actionable error. Every offending branch already carried usable 'error' text, so this mirrors that text into final_response for all 16 sites (8 that omitted the key, 8 that returned None). Adds an AST regression test that fails if any run_conversation() dict return omits final_response or sets it to a literal None, and tightens the invalid-response test to assert final_response == error.	2026-07-01 02:04:28 -07:00
qWaitCrypto	e1ff736f26	fix(anthropic): preserve ordered replay cache markers	2026-07-01 02:03:40 -07:00
qWaitCrypto	80d71e8d2e	fix(anthropic): preserve tool use cache markers	2026-07-01 02:03:40 -07:00
Jeff Watts	a2d6f05d1b	fix(moa): append reference block at end of aggregator prompt for KV-cache reuse The MoA aggregator received the per-turn reference block merged into the most recent `user` message. In an agentic tool loop that message is the original task near the top of the context (everything after it is assistant/tool turns), so injecting text that changes every iteration diverges the prompt prefix early. The server's KV cache then cannot be reused and the entire conversation re-prefills on every tool-loop step — full prefill each step, which dominates latency on long contexts. Append the reference block at the end of the prompt instead (merging into the last message only when it is already a trailing user turn, i.e. plain chat). This keeps the [system][task][tool-history] prefix stable and cache-reusable so only the new block re-prefills, and gives the aggregator the references with recency. Extracted as `_attach_reference_guidance` with unit tests. Measured on a local llama.cpp aggregator over a long agentic task: KV-cache reuse on follow-up steps went from ~0.3% to ~93-95% and per-step prefill on an ~80k-token context dropped from ~44s to <1s, with no change to output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-01 01:59:00 -07:00
sasquatch9818	020d263ef6	fix(agent): defang untrusted-tool-result delimiter against tag injection `_maybe_wrap_untrusted` is the architectural defense against indirect prompt injection. It wraps attacker-controllable tool output (web_extract, web_search, browser_, mcp_) in `<untrusted_tool_result>...</untrusted_tool_result>` so the model treats it as data. The content was interpolated verbatim, so the boundary was forgeable. Two holes. A poisoned page that embeds `</untrusted_tool_result>` closes the block early — everything after it reads as trusted instructions. And the `startswith("<untrusted_tool_result")` re-entrancy guard returned content that merely started with the opening tag completely unwrapped, so an attacker just prefixed the tag to drop all data framing. Fix neutralizes any embedded delimiter token (case-insensitive) before interpolation and drops the forgeable fast-path, so content is always sealed in exactly one well-formed block. Re-wrapping an already-wrapped forward is harmless — it stays framed as data. ## What does this PR do? Closes an indirect prompt-injection bypass in the untrusted-tool-result wrapper. Attacker content can no longer break out of, or forge, the trust boundary. ## Related Issue N/A ## Type of Change - [x] 🔒 Security fix ## Changes Made - `agent/tool_dispatch_helpers.py`: add `_neutralize_delimiters` (case-insensitive defang of the `untrusted_tool_result` token); `_maybe_wrap_untrusted` now always neutralizes then wraps, and the forgeable `startswith` re-entrancy guard is removed. - `tests/agent/test_tool_dispatch_helpers.py`: replace the double-wrap test (it encoded the bypass) with regression tests for embedded closing tag, leading opening tag, and a cased closing tag. ## How to Test 1. `scripts/run_tests.sh tests/agent/test_tool_dispatch_helpers.py` — 29 pass. 2. Embedded `</untrusted_tool_result>` mid-content: real closing delimiter appears once, at the end; payload trapped inside. 3. Content starting with the opening tag: data framing is applied, not skipped. ## Checklist ### Code - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only changes related to this fix - [x] I've run the affected tests and they pass - [x] I've added tests for my changes - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (docstrings) — or N/A - [x] cli-config.yaml.example — N/A - [x] CONTRIBUTING.md / AGENTS.md — N/A - [x] Cross-platform impact — N/A (pure-Python, stdlib `re`) - [x] Tool descriptions/schemas — N/A	2026-07-01 01:54:45 -07:00
liuhao1024	8f4d195d5f	fix(compressor): pin summary role to user when only system prompt is protected (#52160 ) After the first compaction protect_first_n decays, so on a later compaction the only protected head message can be the system prompt. Adapters like Anthropic and Bedrock send the system prompt as a separate parameter, so the summary becomes the first message in messages[] — and Anthropic rejects any request whose first message is not role=user (HTTP 400). Pin the summary to role=user when the head is system-only, and stop the collision-flip logic from reverting it back to assistant. Salvaged from #52167. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
srojk34	82ac7e16b8	fix(compression): preserve network/auth abort flags across cooldown re-entry (#29559 ) compress() eagerly reset _last_summary_auth_failure and _last_summary_network_failure at the top of every call. On a second compress() during the failure cooldown, _generate_summary() returns None from the cooldown early-return WITHOUT re-asserting those flags, so the abort guard saw False and fell through to the destructive static-fallback that drops the middle window — the data-loss #29559/#25585 describe. Stop resetting them eagerly; a successful summary already clears both, so letting them persist across calls is safe and keeps the cooldown abort protection intact. Salvaged from #52056. Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>	2026-07-01 14:24:41 +05:30
liuhao1024	32b23bfb08	fix(compressor): strip orphan tool_calls instead of inserting stubs (#51218 ) _sanitize_tool_pairs inserted stub role="tool" results for orphaned tool_calls. The pre-API repair_message_sequence() tracks known call IDs by tc.get("id") while this sanitizer keys on call_id\|\|id; when they disagree (Codex Responses API: id != call_id) the stubs are silently dropped by the repair pass, re-exposing the original orphans. Strip the orphaned tool_calls at the source instead (preserving any text content, adding a placeholder for an otherwise-empty assistant turn) to avoid the mismatch class entirely. Salvaged from #51225. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
Harish Kukreja	01bf61c865	fix(runtime): honor NOUS_INFERENCE_BASE_URL across pool/explicit/aux paths Upstream #52270 added `_nous_inference_env_override()` but wired it into only `resolve_nous_runtime_credentials`. Three sibling resolution paths still ignored the override, so a self-hosted Nous inference endpoint set via `NOUS_INFERENCE_BASE_URL` was silently dropped whenever credentials arrived through any of them: - the credential-pool path (`_resolve_runtime_from_pool_entry`) - the explicit-provider path (`_resolve_explicit_runtime`) - the auxiliary side-LLM client (`_pool_runtime_base_url`) Route all three through the same auth-layer reader so every `NOUS_INFERENCE_BASE_URL` read shares one normalization path (trailing-slash stripping, blank -> empty) and the documented trusted-bypass intent stays in one place. The override is live-only: it wins for the base URL returned this run but is never persisted to auth.json or the credential pool, so an ephemeral dev/staging value cannot poison durable auth state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:52:06 -07:00
Alex Gierczyk	2296fec210	fix(auxiliary): treat aux <task>.model: auto as sentinel, not a literal model id When auxiliary.<task>.model is set to "auto" in config.yaml, _resolve_task_provider_model() was treating it as a truthy model id and propagating the literal string "auto" to the wire. The provider then returned a 200 OK with an error-text body (e.g. "the model auto does not exist, run --model to pick a different model"), which downstream consumers such as ContextCompressor accept as the compressed summary -- silent corruption with no exception raised. The provider-side auto-resolution path (_resolve_auto via main_runtime fallback) is already wired up and does the right thing when cfg_model is None. The fix is to normalize the auto sentinel at the resolver layer: when cfg_model.lower() == "auto", drop it to None so the resolver can fall through to main_runtime / auto-detect. Reproduction (pre-fix): >>> from agent.auxiliary_client import _resolve_task_provider_model >>> _resolve_task_provider_model("compression") # with model: auto in config ("auto", "auto", None, None, None) Post-fix: >>> _resolve_task_provider_model("compression") ("auto", None, None, None, None) Verified end-to-end: ContextCompressor.compress now produces a real summary (~4KB of compaction text) instead of swallowing the bridge error string. Aux compression on auto/auto config no longer silently corrupts the conversation summary.	2026-07-01 01:44:40 -07:00
teknium1	e00800fc89	feat(classifier): Anthropic-specific guidance for subscription exhaustion When an Anthropic Claude Pro/Max OAuth subscription hits the "out of extra usage" 400 (now classified as billing), surface actionable guidance pointing at claude.ai/settings/usage and the cycle-reset option instead of the generic "add credits with that provider" line — which does not apply to a subscription. Folds in the UX from #40073 (@harsh-matchmyflight) without the extra FailoverReason enum; the billing reclass already provides the recovery behavior.	2026-07-01 01:36:34 -07:00
charleneleong-ai	ea9e8d6e8c	fix(classifier): treat Anthropic "out of extra usage" 400 as billing Anthropic returns HTTP 400 with "You're out of extra usage. Add more at claude.ai/settings/usage and keep going." when the account's extra-usage allowance is depleted. The existing _BILLING_PATTERNS list did not include this wording, so classify_api_error fell through to generic format_error — non-retryable and should_fallback=False — causing the agent to abort instead of engaging the configured fallback chain. Add the pattern and a regression test covering the exact Anthropic body.	2026-07-01 01:36:34 -07:00
Frank Song	ee710db135	fix(compressor): skip context-summary markers as last-user tail anchor A context-compaction handoff banner is inserted with role="user" when the protected head ends in an assistant/tool message. On a resumed or multi-compaction session, _find_last_user_message_idx would return that banner as the latest user turn, so _ensure_last_user_message_in_tail anchored the tail to the summary and rolled the genuine last user message into the next compaction — the exact active-task loss the anchor exists to prevent (#10896/#22523). Reuse the existing _is_context_summary_content helper to skip summary banners when locating the last real user message. Salvaged from #36626 by Frank Song (issue #36624). The PR's other two changes (demoting completed tool results inside the protected tail; a preflight compression_exhausted result) are superseded on current main by the min_tail floor (#39170), the no-op compression counting (#40803), and the existing 413/disabled terminal-error paths.	2026-07-01 01:20:02 -07:00
kshitijk4poor	e09ff88d02	fix(browser): close remaining CDP-URL leak paths in supervisor (review) Review of the salvage found the timeout-message redaction left the more common failure mode unguarded: when the first websockets.connect(cdp_url) fails (bad URI / refused / TLS), the raw websockets exception -- which embeds the full cdp_url incl. ?token= and user:pass@ -- is stashed as _start_error and re-raised verbatim by start(), and two reconnect logger.warning sites log the same raw exception. Add a module-level _redact_cdp_error_text() chokepoint (delegating to agent.redact.redact_cdp_url) and route all four supervisor egress points through it: - start() TimeoutError message (already covered; kept) - start() _start_error re-raise -> now raises a redacted RuntimeError with 'from None' so no secret leaks via message OR traceback cause chain - connect-failed and session-dropped reconnect warnings Guard tests assert the re-raised message is redacted for both token and userinfo, the raw cause is suppressed, and the helper preserves non-secret context (host/reason). Verified with a mutation check: reverting to the raw 'raise err' fails the new tests. Correct the redact_cdp_url docstring to scope its guarantee to direct-URL redaction and point exception callers at the supervisor helper.	2026-07-01 13:43:58 +05:30
kshitijk4poor	c626dded13	refactor(redact): consolidate CDP-URL log redaction into one chokepoint The session-log fix (browser_tool._sanitize_url_for_logs) and the supervisor attach-timeout fix (CDPSupervisor.start) both composed the same three redactors (redact_sensitive_text -> _redact_url_query_params -> _redact_url_userinfo) to mask CDP endpoint credentials. Two copies of one policy drift: tune one site (e.g. add fragment masking) and the other silently re-leaks. Promote that composition to a single public helper redact_cdp_url() in agent/redact.py -- the one place the CDP-URL redaction policy lives -- and route both call sites through it (_sanitize_url_for_logs becomes a thin wrapper; the supervisor imports the helper instead of re-composing the private redactors). Add direct unit tests for the seam covering query tokens, multiple credentials, userinfo passwords, plain-URL passthrough, non-string/exception coercion, and None. No behavior change at the call sites; both leak paths remain closed.	2026-07-01 13:43:58 +05:30
kshitijk4poor	8db6ed7bd9	fix(context): clamp -1 post-compression sentinel in sibling status paths Whole-bug-class follow-up to the tui_gateway fix: the same -1 last_prompt_tokens sentinel (parked by conversation_compression after a compression) leaked into other status readers, producing a raw -1 or a NEGATIVE usage_percent on the transitional turn: - agent/context_engine.py get_status() (the ABC default every external context engine inherits) — highest blast radius - gateway/slash_commands.py /usage context line - cli.py session usage printout All clamped to >=0, mirroring cli.py _get_status_bar_snapshot and the tui_gateway fix. Adds an ABC get_status sentinel-clamp regression test.	2026-07-01 13:36:50 +05:30
Jace Nibarger	060779bb76	fix: bound threat-pattern/FTS5 regex input and cover V4A Move-File edits Salvaged from PR #35130 (the safe subset of jnibarger01's security pass): - threat_patterns.py: replace unbounded (?:\w+\s+)* filler with bounded {0,8} + cap scan input at MAX_SCAN_CHARS (64KiB), and bound the .* runs in the exfil/config-mod patterns. Kills catastrophic backtracking on adversarial near-misses. - hermes_state.py: cap FTS5 query length (MAX_FTS5_QUERY_CHARS) and extract quoted phrases with a linear scan instead of a regex so pathological quote runs can't induce backtracking. - acp_adapter/edit_approval.py + agent/tool_dispatch_helpers.py: recognize '*** Move File: src -> dst' V4A headers so patch-mode edits are permissioned/traversal-checked (previously only Update/Add/Delete), and surface a proposal for mode=patch V4A calls (previously replace-only). Tests: +ReDoS-bound + FTS5-cap + Move-File-target + V4A-approval cases.	2026-07-01 01:05:28 -07:00
zapabob	8e492b5567	fix(file): block credential paths from search results	2026-07-01 01:02:35 -07:00
H2KFORGIVEN	fc2fac73bd	fix(compressor): prevent orphan user turn after compaction via turn-pair preservation When the last user message sits exactly at head_end (the first compressible index), _ensure_last_user_message_in_tail's final max(last_user_idx, head_end + 1) clamp returns head_end + 1, pushing the user into the compressed region without its assistant reply. The summariser then records it as a pending ask, and the next session re-executes the already-completed task (lights off twice, file deleted twice, message re-sent). Fix: apply Causal Coupling — a compaction boundary must never split a (user -> assistant [-> tool results]) turn-pair. Add _find_turn_pair_end and, when the clamp would orphan the user, push the cut forward to pair_end so the completed pair is summarised together and marked done. 8 new tests in TestTurnPairPreservation; 133 compressor tests pass.	2026-07-01 00:27:09 -07:00
Teknium	8d78be5460	revert: back out prompt_caching.enabled toggle (#56105 ) for re-evaluation (#56126 ) * Revert "fix(caching): honor prompt_caching.enabled across model switch + fallback" This reverts commit `36f9f50145`. * Revert "fix: allow disabling prompt caching" This reverts commit `c1c1a12fe6`.	2026-07-01 00:20:32 -07:00
teknium1	36f9f50145	fix(caching): honor prompt_caching.enabled across model switch + fallback @janrenz's PR #35862 added prompt_caching.enabled=false at init only. But _anthropic_prompt_cache_policy re-derives _use_prompt_caching on every /model switch (agent_runtime_helpers) and fallback-model swap (chat_completion_helpers), which re-enabled markers and re-broke the strict proxy the toggle was meant to fix. Move the kill switch into anthropic_prompt_cache_policy so it returns (False, False) on every path. Drop the now-redundant init-time override (kept @janrenz's isinstance hardening on the cache_ttl read). Add policy-level tests + docs for the toggle. Follow-up to salvaged PR #35862.	2026-07-01 00:10:42 -07:00
Jan Renz	c1c1a12fe6	fix: allow disabling prompt caching	2026-07-01 00:10:42 -07:00
Teknium	2e8748ed22	feat(moa): opt-in full-turn trace persistence to JSONL (#56101 ) Adds moa.save_traces (default off). When on, every MoA turn that runs the reference fan-out appends one JSON line to <hermes_home>/moa-traces/<session_id>.jsonl capturing the TRUE FULL turn: each reference model's exact input messages (system advisory prompt + full advisory view, not the truncated display preview) + full output + usage + per-advisor cost, and the aggregator's exact input (including the injected reference-context guidance block) + output. Lets MoA runs be audited and improved offline — what every model saw, said, and cost. - agent/moa_trace.py: config-gated JSONL writer, profile-aware path via get_hermes_home(), best-effort (never breaks a turn), moa.trace_dir override. - agent/moa_loop.py: _RefAccounting now carries full input/output/model/ provider/temperature; create() stashes the full turn on a cache MISS (once per turn, never on the cache-HIT repeat iterations); non-streaming aggregator output captured inline, streaming marked + pointed at the session assistant message. consume_and_save_trace(session_id) flushes it. - agent/conversation_loop.py: flushes the trace with the live session_id right after MoA usage consumption. No-op for non-MoA clients. - hermes_cli/config.py: moa.save_traces + moa.trace_dir defaults. Traces are a side channel — NOT the messages table, never in replay, safe to delete. Off by default; only overhead when off is one config read on a MoA cache-MISS turn. Tests: full-trace-when-enabled (per-ref input+output+cost, aggregator input-with-guidance + output), nothing-when-disabled. Live E2E through run_conversation confirmed the loop wiring writes the file.	2026-07-01 00:09:42 -07:00
Teknium	3bdb23de10	fix(moa): count reference (advisor) fan-out token usage + cost (#56087 ) MoA ran the reference models before the aggregator but returned only the aggregator's usage to the loop — _run_reference discarded each advisor response's .usage entirely. Session accounting (state.db, /insights, cost) therefore undercounted every MoA turn by the whole reference fan-out, which is usually the bulk of the spend and scales with advisor count. - _run_reference normalizes each advisor's usage with ITS OWN resolved provider/api_mode and prices it at ITS OWN model rate (correct cache-read/ cache-write split), returning a _RefAccounting(usage, cost). - create() sums advisor usage + cost once per turn (cache MISS only, so a repeat tool-iteration reusing cached advice does not double-charge) and exposes it via MoAClient.consume_reference_usage(). - conversation_loop folds advisor tokens into the reported/persisted token counts and adds advisor cost (priced per-advisor) on top of the aggregator cost, in both the in-memory session totals and the state.db per-call delta. Aggregator cost is still priced on aggregator-only usage so advisor tokens are never repriced at the aggregator rate. - CanonicalUsage gains __add__ for per-bucket summing. Tests: advisor usage/cost capture, per-turn sum + consume-clears + cache-hit no-double-charge, CanonicalUsage.__add__.	2026-06-30 23:08:37 -07:00
Teknium	a653bb0cbe	refactor(moa): unify slot provider-identity on the single call_llm chokepoint (#55991 ) _slot_runtime maintained a hand-listed name-preservation set ({nous, anthropic, openai-codex, xai-oauth, bedrock}) that returned bare provider+model to avoid call_llm collapsing an explicit base_url to the generic 'custom' route. That duplicated _resolve_task_provider_model's _preserve_provider_with_base_url guard (a provider-catalog capability check) and had to be extended by hand for every provider with custom auth/signing — the exact drift that produced the anthropic (#54609) and bedrock (#54912) 429/ empty-response bugs. Removes the whitelist: _slot_runtime now forwards the resolved base_url/api_key/ api_mode for every slot, and the single chokepoint (_resolve_task_provider_model -> _preserve_provider_with_base_url) decides identity preservation. Behavior is unchanged for the five providers — their provider branches (codex Responses+Cloudflare, xai-oauth, bedrock SigV4, anthropic OAuth Bearer+anthropic-beta, nous Portal tags) re-resolve their own credentials by name and ignore a forwarded base_url/api_key, so forwarding is safe even for bedrock's placeholder 'aws-sdk' key. Verified via real-import E2E: _slot_runtime -> _resolve_task_provider_model preserves openai-codex/xai-oauth/bedrock/anthropic/nous (+openrouter control) — none collapse to custom. Tests updated to assert the pipeline invariant against the real resolver instead of the removed whitelist's bare-return shape.	2026-06-30 18:59:45 -07:00
iizotov	6eca917631	fix(moa): route bedrock MoA slots through signed bedrock branch _slot_runtime() resolved a bedrock slot to its bedrock-runtime base_url plus the placeholder api_key "aws-sdk" and forwarded both to call_llm. call_llm then treated it as a plain OpenAI-compatible endpoint and issued an UNSIGNED bearer POST (no AWS SigV4 / IAM signing), so Bedrock returned an empty/malformed ChatCompletion (choices=None) and the MoA aggregator turn failed validation. Add 'bedrock' to the name-preserve set alongside nous/openai-codex/ xai-oauth so bedrock slots are passed by provider name only, routing through call_llm's dedicated SigV4-signed bedrock branch. Affects any MoA preset using a bedrock aggregator or bedrock reference.	2026-06-30 17:45:45 -07:00
Chufeng Fan	4d43669921	fix(moa): route native anthropic OAuth references through provider branch MoA's _slot_runtime() whitelists providers that must keep their provider identity (so call_llm runs their provider branch) instead of being treated as a plain custom endpoint via forwarded base_url/api_key. Native anthropic was missing from this set. Native anthropic subscription OAuth setup-tokens (sk-ant-oat) require Bearer auth plus the 'anthropic-beta: oauth-' header, which only the anthropic provider branch adds. Without the whitelist entry, the slot's base_url/api_key were forwarded and call_llm sent the OAuth token as x-api-key, which Anthropic rejects with a bare 429 (rate_limit_error with no quota details). This made anthropic references in MoA presets fail every time. Add 'anthropic' to the whitelist so native anthropic reference/aggregator slots route through the provider branch. Extends upstream `9229d0db1` which added 'nous' for the same reason.	2026-06-30 17:45:45 -07:00
teknium1	508156fd42	test(credential_pool): cover Anthropic env auth_type classification Add regression tests for the sk-ant-oat OAuth heuristic and shorten the inline comment. Verifies admin keys (sk-ant-admin-*) and standard API keys classify as api_key, only sk-ant-oat- tokens flow into the OAuth refresh path.	2026-06-30 17:29:03 -07:00
charliekerfoot	18966b6244	fix(credential_pool): match Anthropic OAuth tokens by sk-ant-oat prefix	2026-06-30 17:29:03 -07:00
Teknium	b5267671f2	fix(bg-review): scope stdout/stderr silencing to the worker thread (#55966 ) The background memory/skill review thread wrapped its whole body in process-global contextlib.redirect_stdout/stderr(devnull). Those rebind sys.stdout/sys.stderr for the ENTIRE process, so for the full duration of the review (tens of seconds) every other thread — including a gateway event-loop thread driving a Telegram long-poll — also wrote to devnull. Any bare print/sys.stderr.write from those threads during the window was silently lost (#55769 / #55925). Replace the global redirect with thread_scoped_silence(): a per-thread routing proxy installed once as sys.stdout/sys.stderr that sends only the registered (bg-review) thread's writes to devnull and passes every other thread through to the real stream. Depth-counted so nested use composes. Verified: a concurrent thread writing while the bg-review thread is inside the silence window keeps its output on the real stream.	2026-06-30 17:28:33 -07:00
teknium1	36bfe3a449	fix(anthropic+feishu): model-gate max_tokens fallback; wire Feishu channel_prompt Two independent fixes salvaged from #12811 (closing it; one of its three bundled fixes — Discord free_response — is already on main). Anthropic max_tokens (#12790): the chat-completions max_tokens fallback only fired for OpenRouter/Nous URLs, so any other proxy serving a Claude model (AWS Bedrock, NVIDIA, LiteLLM, vLLM, corporate gateways) shipped requests with no max_tokens and inherited the proxy's low default (Bedrock: 4096), exhausting on thinking + large tool calls. Changed the gate in chat_completion_helpers.build_api_kwargs from URL-gated to model-gated: fires whenever the model matches an _ANTHROPIC_OUTPUT_LIMITS key. This also fixes a latent miss — the old 'claude' substring gate skipped MiniMax and Qwen3 even on OpenRouter. Remains a last-resort fallback (build_kwargs only applies it after ephemeral/user/profile max_tokens), so it never overrides an explicit value, and only touches the chat-completions transport (native Anthropic Messages API is a separate path). Feishu channel_prompt (#12805): the Feishu adapter never resolved channel_prompts config, unlike Discord/Slack, so per-channel role prompts were silently ignored. Added _resolve_channel_prompt() (delegating to the shared gateway.platforms.base.resolve_channel_prompt) and wired it into all three MessageEvent construction sites — inbound message, reaction routing, and card-action routing. Tests: tests/gateway/test_feishu_channel_prompts.py (6 cases) covering exact match, parent-thread fallback, no-match, missing-config safety, and event propagation.	2026-06-30 17:20:41 -07:00
Teknium	d431dfc448	fix(learn): honor requirements mixed with sources in /learn requests (#55956 ) A /learn request can mix the source(s) to gather (paths, URLs, "what we just did") with requirements that shape the skill (focus, scope, what to omit). When a request led with a path or link, the agent fetched it and treated the trailing prose as incidental, dropping the user's stated focus — the symptom @GrenFX reported. The input layer was never the cause: both CLI (split(None, 1)) and gateway (get_command_args()) capture the full free-text argument. The gap was in build_learn_prompt, which dumped the request as one undifferentiated source blob. build_learn_prompt now tells the agent the request may mix sources and requirements in any order, that prose after a path/link is authoring guidance to honor (not noise), and to never fetch the first source and ignore the rest. Adds step 1b: apply every requirement to what the SKILL.md covers, not just which sources get read. Both surfaces inherit it; no parser change, zero tool footprint.	2026-06-30 16:56:01 -07:00
LeonSGP43	ff4c17411c	fix(streaming): handle adapters that return final responses # Conflicts: # run_agent.py	2026-06-30 16:41:09 -07:00
Teknium	97e0bbef53	feat(lsp): add PowerShellEditorServices language server (#55930 ) Registers PowerShell (.ps1/.psm1/.psd1) in the LSP server registry, spawning PowerShellEditorServices over stdio via a pwsh/powershell host. PSES ships as a GitHub release zip (no npm/go/pip recipe), so it sits in the manual install tier alongside rust-analyzer and clangd. The spawn builder resolves the module bundle from (in order) the lsp.servers.powershell.command override, init bundlePath, the PSES_BUNDLE_PATH env var, or <HERMES_HOME>/lsp/PowerShellEditorServices, then launches Start-EditorServices.ps1 -Stdio with a non-interactive, no-profile host. hermes lsp status/list report it as manual-only until pwsh is present. Docs and tests included.	2026-06-30 16:22:18 -07:00
ygd58	812236bff8	fix(compressor): skip compression during summary LLM cooldown to prevent CLI freeze When the summary LLM hits a 429/transient failure, _generate_summary() sets a cooldown and returns None; compress() inserts a static fallback marker and returns. Tokens stay above threshold, so should_compress() kept returning True and every subsequent agent turn re-fired _compress_context() — the CLI appeared frozen until the cooldown expired. Add a cooldown guard to should_compress(): return False while _summary_failure_cooldown_until is in the future. Reuses the existing float; no new state. Manual /compress (force=True) still clears the cooldown first. Fixes #11529	2026-06-30 15:57:59 -07:00
Teknium	0cebf994c9	fix(agent): repair empty-name tool_calls in sanitizer to prevent Responses 400 (salvage #12807/#52893) (#55922 ) * fix(agent): drop tool_calls with empty function.name to prevent orphan 400 Salvage of #12807 by @melonboy312 — rebased onto current main (sanitizer moved to agent_runtime_helpers), scoped to the sanitizer fix, with a regression test that fails without it. * fix(agent): repair (not drop) empty-name tool_calls to preserve anti-priming + prevent 400 Dropping empty-name tool_calls in the pre-call sanitizer collided with #47967, which intentionally keeps an empty-name call paired with a synthesized 'tool name was empty' anti-priming result so weak models self-correct without a full catalog dump. Dropping the call orphaned that result and stripped the signal (breaking tests/agent/test_empty_tool_name_loop_dampening.py). The actual HTTP 400 cause is an ORPHANED function_call_output (adapter drops the empty-name function_call but keeps its output). Rename the blank name to a non-empty sentinel instead: the call and its result stay paired, the adapter no longer drops the function_call, no orphan, no 400 — and the anti-priming result content the model needs is preserved. --------- Co-authored-by: Bartok9 <danielrpike9@gmail.com>	2026-06-30 15:57:46 -07:00
kyssta-exe	20871c1d94	fix(skills): require review forks to read before writing skills	2026-06-30 15:49:36 -07:00
Erosika	437dcacbbf	fix(profile): gate bg-review memory tool on memory_enabled (#54937 layer 2) background_review hardcoded enabled_toolsets=["memory", "skills"] in the review fork's whitelist, so a skill-review fork on a profile with memory_enabled: false still granted the LLM the built-in MEMORY.md read/write tool — contaminating a profile that opted out of built-in memory. The flag was already in scope (review_agent._memory_enabled). Include "memory" only when _memory_enabled or _user_profile_enabled (USER.md also needs the tool). Layer 1 of #54937 (the path leak) is fixed by this PR's thread-context propagation: get_memory_dir() is already per-call on main, so once the bg-review thread inherits the profile override its writes land in the right profile (verified). This commit closes the remaining whitelist layer.	2026-06-30 15:30:06 -07:00
brooklyn!	d8083221a8	Merge pull request #55865 from NousResearch/bb/pet-pane-layout fix(tui): float petdex pet on the status bar + responsive text reservation	2026-06-30 15:46:41 -05:00

1 2 3 4 5 ...

1591 commits