hermes-agent

Author	SHA1	Message	Date
srojk34	7f64cce96d	security(vertex): route credential/project/region resolution through the profile secret scope agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH, GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw os.environ.get() instead of the profile-scoped get_secret() every other credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex gateway serving several profiles from one process, os.environ still holds whichever profile's .env python-dotenv loaded at boot — so a raw read here let one profile's turn silently mint a Vertex OAuth2 token from, and get billed against, a different profile's GCP service account. No error, no fail-closed guard: the multiplex UnscopedSecretError protection was bypassed entirely because these reads never went through get_secret(). - _resolve_credentials_path/_resolve_project_override/_resolve_region now call agent.secret_scope.get_secret(), matching the _getenv() pattern already used for every other provider's credentials. - get_vertex_credentials()'s ADC fallback (google.auth.default()) reads GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing get_secret() entirely — closed with a narrow guard: when multiplexing is active and this profile's scope has no Vertex credentials of its own, but os.environ still carries a value (left by a different profile's boot-time dotenv load), refuse ADC rather than silently authenticate as a stranger. - Zero behavior change for single-profile installs: get_secret() falls through to os.environ transparently whenever multiplexing is off. Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/ HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential path.	2026-07-02 06:07:56 +05:30
kshitijk4poor	676236bb1d	fix(agent): honor custom CA certs on aux client + harden TLS resolution The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up: - Auxiliary client parity: process_bootstrap.build_keepalive_http_client accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors the main-client TLS resolution (via load_config_readonly, the read-only fast path) so compression/vision/web_extract/title-gen/session_search honor the same per-provider CA. Without this, chat worked against a private-CA endpoint but every auxiliary call still failed APIConnectionError. - switch_model now reads custom_providers from live config (load_config_readonly) instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert / ssl_verify edits are honored on mid-session model switch — matching the context-length reload (#15779). - Drop the dead client-level verify= where a custom httpx transport is used (httpx ignores it there); verify lives on the transport. Fix docstrings. Applies to both run_agent._build_keepalive_http_client and process_bootstrap. - resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming the endpoint whenever ssl_verify:false disables verification. - get_custom_provider_tls_settings: case-insensitive base_url match (config dedup already lowercases; scheme/host are case-insensitive) so a mixed-case entry doesn't silently drop its CA. Exact match preserved — no prefix bypass. - Demote best-effort except Exception: pass in agent_init/switch_model to logger.debug(exc_info=True). - Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive match, and prefix-bypass rejection.	2026-07-02 04:51:56 +05:30
HexLab98	3a2ba959ce	fix(agent): honor custom CA certs for custom_providers HTTPS endpoints Wire ssl_ca_cert and ssl_verify through custom_providers config and env vars into the keepalive httpx client, fixing APIConnectionError against mkcert/self-signed Ollama proxies behind HTTPS.	2026-07-02 04:51:56 +05:30
HexLab98	7e957cbd0b	feat(agent): add resolve_httpx_verify for custom CA bundle TLS Introduce a shared helper that maps HERMES_CA_BUNDLE, SSL_CERT_FILE, and per-provider ssl_ca_cert settings to httpx verify contexts.	2026-07-02 04:51:56 +05:30
Brooklyn Nicholson	ec319e4e3e	fix(learning_graph): guard non-dict metadata so /journey can't crash parse_frontmatter's malformed-YAML fallback stores every value as a string, so a skill's `metadata` can be a str. `_category`/`_related` chained `.get("metadata", {}).get("hermes", {})` and blew up with `'str' object has no attribute 'get'`, taking down `build_learning_graph()` (and thus /journey and `hermes journey`) whenever any installed skill had bad frontmatter. Extract a `_hermes_meta()` helper that returns the nested dict only when it really is one. Fixes the whole class, not just the two call sites.	2026-07-01 16:25:48 -05:00
kshitijk4poor	b23e1c3077	refactor(approval): extract is_approval_bypass_active(); use frozen-env bypass in codex routing Self-review follow-up on the salvaged approval-routing fix. The initial adaptation re-read os.getenv("HERMES_YOLO_MODE") at session-build time. That diverges from the repo's security invariant: HERMES_YOLO_MODE is frozen into tools.approval._YOLO_MODE_FROZEN at import time precisely so a skill running mid-process cannot set the env var and instantly flip the approval bypass (a prompt-injection escalation path). A live re-read re-opened that hole for the codex routing path. - Add tools.approval.is_approval_bypass_active() — the canonical three-source bypass check (frozen --yolo/HERMES_YOLO_MODE + session /yolo + approvals.mode off) in one place. This is the 4th inline copy of that OR-chain (the three sites in approval.py and tui_gateway/server.py:3121 all use the same idiom); the helper is the shared chokepoint they can collapse onto. - codex_runtime.py now calls is_approval_bypass_active() instead of the hand-rolled mode-or-session check plus a runtime env re-read. - Update the env-yolo test to patch _YOLO_MODE_FROZEN (the canonical test pattern, e.g. tests/tools/test_yolo_mode.py) rather than setenv, which is dead-on-arrival against the frozen constant. Fail-closed default preserved on every branch; 28 integration + 77 session/yolo tests pass; E2E confirms the real exec decision flips decline->accept only when bypass is active.	2026-07-01 22:58:37 +05:30
snav	0b8e81996f	fix(codex-app-server): honor approvals.mode/yolo for gateway-context approval routing On gateway/cron/non-CLI contexts the codex app-server runtime has no UI to surface codex's exec/apply_patch approval requests, so they fail closed (silently decline) — the bot appears responsive but cannot write files, with no approval prompt anywhere ("patch rejected by user"). When the user has explicitly opted out of Hermes approvals (approvals.mode: off, the /yolo session toggle, or HERMES_YOLO_MODE=1), collapse to codex's own sandbox permission profile (~/.codex/config.toml) as the policy gate by passing _ServerRequestRouting(auto_approve_exec=True, auto_approve_apply_patch=True) to the session. Defaults (manual/smart/unset) preserve the current fail-closed behavior — a no-op for users who have not opted out. Reads the mode via the canonical tools.approval._get_approval_mode() (which already normalizes the YAML-1.1 bare-'off'->False case) at session-build time, so a mid-session /yolo toggle is honored too. 5 integration tests: each opt-out mechanism (config off, YAML False, env var, session yolo) plus the default fail-closed regression guard. Closes #26530 Co-authored-by: snav <jake@nousresearch.com>	2026-07-01 22:58:37 +05:30
Teknium	eae3700b16	fix(moa): raise aux timeouts to 900s and give the Codex aux path a stable prompt_cache_key (#56395 ) Two independent MoA auxiliary-call fixes: #53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long reference/aggregator turn (mixed providers, deep reasoning, long tool chains) has headroom instead of being cut mid-generation. #53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by the MoA acting-aggregator, compression, web_extract, session_search, etc.) never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses transport (agent/transports/codex.py) was warm. Derive the same content-addressed key via the shared _content_cache_key(instructions, tools) helper and set it on the aux Responses request, with the same host guards the main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out of cache-key routing). Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical prefix, differs on different instructions, skipped for xai/github hosts). tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py 130 pass.	2026-07-01 06:02:40 -07:00
Teknium	aa605b66c8	fix(moa): price aggregator turn at its real model so session cost isn't advisor-only (#56394 ) On the MoA path agent.model/provider are the virtual preset name (e.g. "closed") and "moa", which have no pricing entry. estimate_usage_cost() returned None for the aggregator turn, so the `if amount_usd is not None` guard skipped it and the session's estimated_cost_usd reflected only the advisor fan-out — a ~50% undercount when the aggregator does the full acting loop (verified: $0.91 advisor-only vs $1.96 true, aggregator = 54%). MoAChatCompletions.create() now stashes the resolved aggregator slot as last_aggregator_slot (exposed via MoAClient); conversation_loop reads it to price the aggregator turn at its real model/provider. cost_source flips from 'none' to 'provider_models_api'.	2026-07-01 06:02:33 -07:00
kshitijk4poor	b795a45b8d	fix(compaction): detect and strip merge-into-tail summaries past the delimiter Follow-up to the END-MARKER reorder: moving the summary prefix after the [PRIOR CONTEXT] wrapper meant _is_context_summary_content (prefix-at-start) no longer recognized a merged-tail summary. That silently broke three consumers — the last-real-user anchor (would pick the merged summary as a real user turn, causing active-task loss), the carry-forward summary find, and the auto-focus skip. _strip_summary_prefix would also carry the wrapper + stale tail content forward as the next summary body. Extract the two delimiter strings into _MERGED_PRIOR_CONTEXT_HEADER / _MERGED_SUMMARY_DELIMITER constants (writer + detector stay in sync), teach _is_context_summary_content and _strip_summary_prefix to look past the delimiter, and add a regression test. Standalone summaries unchanged.	2026-07-01 18:23:01 +05:30
Gromykoss	a1a8a967e1	fix(compaction): place END MARKER last in merge-into-tail summaries When the compression summary is merged into the first tail message (the alternation corner case where a standalone summary role would collide with both head and tail), the old format was SUMMARY + END_MARKER + OLD_TAIL_CONTENT — so the preserved tail content appeared AFTER the end marker and the model could read it as a fresh message to respond to. Reorder so the END MARKER is always last: old tail content is wrapped in [PRIOR CONTEXT ...][END OF PRIOR CONTEXT — COMPACTION SUMMARY BELOW] delimiters, then the summary, then the END MARKER. _append_text_to_content handles both string and multimodal-list content. Salvaged from #56372 by @Gromykoss. Only the END-MARKER reorder half is carried over. The PR's second change (a post-compaction pass that strips user-role messages before the first summary marker on compression_count>=2) was dropped: on 2nd+ compactions the protected head decays to system-only (_effective_protect_first_n -> 0, #11996) so the targeted 'ghost head user' does not occur, and where the strip does fire it deletes legitimate recent tail user turns (data loss) and can leave consecutive assistant messages (role-alternation violation).	2026-07-01 18:23:01 +05:30
Steve Lawton	c73e74386b	feat(vertex): add Google Vertex AI provider for Gemini (OAuth2) Adds Vertex AI as a first-class provider for Gemini models via Vertex's OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2 access tokens (service-account JSON or ADC), not a static API key — the missing piece behind the recurring requests (#13484, #12639, #56259). - agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry (5-min margin), ADC->service-account fallback, global vs regional endpoint URLs. Config precedence: env var > config.yaml > default. - plugins/model-providers/vertex/: provider profile (auth_type=vertex), reuses Gemini's extra_body.google.thinking_config translation. - runtime_provider: vertex short-circuit BEFORE the credential pool so a credentials-file path is never mistaken for a static API key; mints a fresh token + computes base_url per resolve. - run_agent + conversation_loop: _try_refresh_vertex_client_credentials() re-mints the token and rebuilds the client on a mid-session 401, so a long-lived gateway agent survives token expiry (~1h). - auxiliary_client: vertex auth_type branch for side-LLM tasks. - config.yaml: vertex.project_id / vertex.region (non-secret, bridged to env); credential path stays in .env (VERTEX_CREDENTIALS_PATH). - setup wizard + model picker: dedicated _model_flow_vertex; curated google/gemini-* model list; --provider choices. - pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint host auto-maps to the vertex provider (no probe spam). - lazy_deps + pyproject [vertex] extra: google-auth, opt-in only. - docs: guides/google-vertex.md + providers page; tests for adapter + runtime resolution. Salvages and modernizes #8427 by @slawt onto current main: rewired from the legacy PROVIDER_REGISTRY path to the provider-profile architecture, moved non-secret config out of .env into config.yaml, and added the per-turn 401 token-refresh the original lacked.	2026-07-01 05:25:33 -07:00
HODLCLONE	6ed2f5d76f	fix: make Nous Portal access token resolution resilient - Track auth store source path on Nous state reads and write rotated OAuth refresh tokens back to the same store, preventing stale-token replays when Hermes falls back to a global/root auth.json. - Skip Nous fallback entries locally when no access/refresh token is present, suppressing repeated failed resolution attempts within a session. - Sync session model metadata after fallback switches so the gateway DB reflects the backend that actually served the latest turn.	2026-07-01 05:06:00 -07:00
ud	c126a99fc1	fix(subdirectory_hints): catch RuntimeError from Path.expanduser() `pathlib.Path('~user').expanduser()` raises RuntimeError when the tilde-expansion can't resolve the user (e.g. `~500-700` where the LLM meant "approximately 500-700" rather than a path). The hint walker's existing `except (OSError, ValueError):` clauses do not catch RuntimeError, so it escapes through the tool dispatcher and surfaces in the conversation loop as a misleading Error during OpenAI-compatible API call #N: Could not determine home directory. Reproduced across three unrelated models (openai/gpt-5-mini, openai/gpt-5.1-codex, deepseek/deepseek-v4-flash) on terminal-tool commands containing literal tildes in non-path contexts — common in LLM output ("~500 agencies", "~45,000 CVEs", "~80/hr blended rate"). Reproduction (one-liner): >>> from pathlib import Path >>> Path("~500-700").expanduser() RuntimeError: Could not determine home directory. Fix: extend the three `except` clauses in agent/subdirectory_hints.py to also catch RuntimeError: line 138 (_add_path_candidate's outer catch around the Path().expanduser() call) lines 198+202 (_load_hints_for_directory's nested catches around hint_path.relative_to(Path.home())) Tests: tests/agent/test_subdirectory_hints_tilde.py adds three cases covering: tilde-as-approximately in heredoc commands, ~unknown_user paths, and a regression guard that legitimate ~/path expansion still works.	2026-07-01 04:55:15 -07:00
JabberELF	18a9467fca	fix(tui): prevent killpg suicide during MCP shutdown Root cause: gateway spawns LSP servers (jdtls/pyright/yaml-ls) and slash_worker without start_new_session=True, so they inherit the gateway process group (= TUI parent PID). When mcp_tool _snapshot_child_pids() races with these spawns during stdio MCP server startup, non-MCP children leak into _stdio_pgids with the TUI parent PGID. shutdown_mcp_servers() then killpg(tui_parent_pid, SIGTERM), killing the TUI itself. Evidence: tui_gateway_crash.log shows recurring SIGTERM stacks: shutdown_mcp_servers -> _kill_orphaned_mcp_children -> _send_signal -> killpg(pgid, sig) -> SIGTERM received Fix (3 layers): 1. agent/lsp/client.py: add start_new_session=True to LSP server spawn so each LSP server gets its own process group/session. 2. tui_gateway/server.py: same fix for slash_worker spawn, the symmetric root-cause patch so no gateway direct child shares the TUI parent pgid. 3. tools/mcp_tool.py: add _filter_mcp_children() defense-in-depth that drops non-MCP children (slash_worker, jdtls/eclipse LSP) from the PID delta before they can poison _stdio_pgids.	2026-07-01 04:54:46 -07:00
kshitijk4poor	dc1ea005d9	fix+test(codex): self-persist projected turns; keep agent_persisted=True Follow-up correcting the salvaged fix's persistence approach to avoid a duplicate user-message write (verified via E2E — the #860/#42039 bug class the original diff aimed to avoid). Root cause: in gateway mode the AIAgent is built WITH a session_db, so the inbound user turn is already flushed at turn start (turn_context. _persist_session). The original fix returned agent_persisted=False, making the gateway re-write the whole new-message slice via append_to_transcript -> append_message (a raw INSERT with no dedup), duplicating the already-flushed user turn. Corrected approach (single writer): run_codex_app_server_turn now flushes its OWN projected assistant/tool messages via _flush_messages_to_session_db (which dedups the already-persisted user turn through _DB_PERSISTED_MARKER) and returns agent_persisted=True so the gateway skips its write. Net result: session_search/distill see the full codex conversation, each message persisted exactly once. Adds regression coverage asserting exactly-once persistence on a real SessionDB, agent_persisted=True, FTS visibility, and standard-runtime skip-db behaviour preserved. Co-authored-by: Lubos Buracinsky <lubos@komfi.health>	2026-07-01 17:08:59 +05:30
Lubos Buracinsky	5558382457	fix(codex): persist app-server turns to session DB (fixes starved recall) The codex_app_server runtime path (run_codex_app_server_turn in agent/codex_runtime.py) is an early-return that bypasses conversation_loop and never calls _flush_messages_to_session_db(). Meanwhile, gateway/run.py sets: agent_persisted = self._session_db is not None # always True and passes skip_db=agent_persisted to every append_to_transcript call, assuming the agent self-persisted (correct for the standard runtime, wrong for codex). The result: codex turn messages are persisted nowhere. state.db accumulates only session_meta rows; session_search (full-text search over state.db) and conversation-distill are blind to real gateway conversations, causing 'the agent has no memory of what we discussed'. Fix (three-part, all backward-compatible): 1. agent/codex_runtime.py — run_codex_app_server_turn success return now includes 'agent_persisted': False, signalling that the codex path did NOT self-persist its turn. 2. gateway/run.py — the agent_persisted assignment now reads: agent_result.get('agent_persisted', self._session_db is not None) For the standard runtime (which does not set the key) the default (self._session_db is not None) preserves the existing skip-db behaviour so no duplicate-write regression (#860 / #42039) occurs. For the codex runtime the flag is False, so the gateway writes the new turn's messages to state.db and FTS index. 3. gateway/run.py — the rebuilt result dict (run_agent return, which becomes agent_result upstream) now includes agent_persisted passed through from result_holder[0], with a safe True default. Without this passthrough the flag set in step 1 was discarded when the result was reconstructed, causing agent_result.get('agent_persisted', ...) to always see the default True and never write codex turns.	2026-07-01 17:08:59 +05:30
Dutch Dim	154c382d65	fix(gateway): recover from truncated responses	2026-07-01 17:08:50 +05:30
kshitijk4poor	9cf47fef54	fix(auxiliary_client): demote the 2 sibling routing fall-throughs too (review) Phase 2c review flagged that only 2 of the 4 structurally-identical resolve_provider_client routing dead-ends were demoted. Complete the bug-class: also demote+dedup the external-process ('not directly supported') and OAuth ('not directly supported, try auto') fall-throughs, keyed by provider name, so none of the four dead-ends spam WARNING on a retry loop. Add direct tests for the unhandled-auth_type and OAuth dedup paths via a monkeypatched PROVIDER_REGISTRY (the review noted these were unverified). Mutation-checked: reverting either sibling demotion fails its test.	2026-07-01 17:00:30 +05:30
kshitijk4poor	c0d3ceb17e	fix(auxiliary_client): dedup resolve_provider_client fall-through warnings The two fall-through branches in resolve_provider_client (unknown provider, unhandled auth_type) logged at WARNING on every retry of a misconfigured provider, spamming logs during retry loops. Demote both to logger.debug with per-process dedup: the first occurrence still surfaces (a provider-name typo or PROVIDER_REGISTRY/auth_type-drift bug is worth seeing once), while identical repeats are suppressed for the process lifetime. Salvaged from #56283 (extracting only the stated auxiliary_client fix; the original PR also bundled ~2800 lines of unrelated changes across 10 other files, which are dropped).	2026-07-01 17:00:30 +05:30
shawchanshek	3b739b990b	fix(title_generator): strip think blocks from LLM output before extracting title Think-enabled models (MiniMax M2.7, DeepSeek, etc.) emit inline <think>...</think> reasoning even for simple prompts like title generation, and the raw XML was leaking into session titles. Route the title-model response through the canonical strip_think_blocks scrubber before cleanup so every tag variant — closed pairs, unterminated blocks, orphan closes, mixed case — is handled, not just a single literal <think> pair. - 2 regression tests: closed <think> pair stripped, unterminated block at start yields no title. Salvaged from PR #44126 by @shawchanshek.	2026-07-01 04:18:48 -07:00
shandian64	5126902f1d	fix(title): honor configured auxiliary timeout	2026-07-01 16:41:43 +05:30
Teknium	5de65624d1	fix(moa): capture streamed aggregator output into full-turn traces (#56312 ) MoA full-turn traces (moa.save_traces) recorded the aggregator's acting output only on the non-streaming path, where it's captured inline at call time. On the streaming path — which every hermes chat --query run and every live gateway/CLI turn takes — the aggregator's raw token stream is handed to the live consumer, so the trace left output=null and only pointed at the session-db assistant row. An offline audit of a benchmark run (HermesBench drives --query) then couldn't see what the aggregator produced without hand-joining to state.db. Capture the resolved streamed acting text at trace-flush time (the agent already holds it in _current_streamed_assistant_text) and fold it into the trace, so the record is self-contained in both modes. New output_location value inline_from_stream marks a streamed turn whose text was captured this way; a genuinely empty acting turn (pure tool call) still points at the session db, matching state.db exactly. Touches only the trace side-channel — no change to the acting path, message history, role alternation, or prompt cache. - agent/moa_loop.py: consume_and_save_trace(..., aggregator_output_fallback) on both the facade and the MoAClient wrapper; prefer inline capture, fall back to the resolved streamed text. - agent/moa_trace.py: embed the fallback; add inline_from_stream location. - agent/conversation_loop.py: pass _current_streamed_assistant_text at flush. - tests: 5 cases across streaming / non-streaming / empty-fallback / no-double-write.	2026-07-01 04:07:46 -07:00
arminanton	e2fa509bf3	fix(review): isolate the background-review fork from the canonical session The forked skill/memory review agent shares the parent's session_id for prompt-cache warmth. Without isolation it wrote its harness turn ('Review the conversation above and update the skill library…') plus its curator-mode reply straight into the user's REAL session in state.db; the next live turn re-read that injected user message as a standing instruction and the agent 'became' the curator, refusing the actual task. Root fix: a _persist_disabled flag on the fork that hard-stops every DB write and lazy-open path (_flush_messages_to_session_db, _ensure_db_session, _get_session_db_for_recall) — the review writes only to the skill/memory stores via its tools. Defense-in-depth: _strip_background_review_harness drops any stray harness message (and the assistant reply that followed) at load time in get_messages_as_conversation, so an already-polluted session resumes clean. Salvaged from #50296. Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>	2026-07-01 16:21:39 +05:30
pefontana	a04b7024ff	fix(error-classifier): route 5xx context-overflow into compression Local inference servers (llama.cpp/llama-server, vLLM/Ollama behind a Cloudflare/Tailscale hop) report context overflow with HTTP 500/502/503/529 instead of 400/413. _classify_by_status returned server_error/overloaded and retried blindly, then dropped the turn with no compaction. Route explicit _CONTEXT_OVERFLOW_PATTERNS matches on those 5xx codes to context_overflow (should_compress=True); plain 500 stays server_error, plain 503 overloaded.	2026-07-01 16:14:16 +05:30
WXBR	59e7e9d007	fix(agent): persist recovered final responses Close a recovery/fallback final_response with an assistant transcript entry before session persistence so durable history cannot end at a tool/user message after the caller receives a final answer. Adds a regression for a tool-tail transcript with a non-empty final_response. Related to #46071 / #46053, but covers the adjacent case where the assistant message was never appended before persistence.	2026-07-01 03:34:49 -07:00
Tranquil-Flow	122e5bc037	fix(agent): retry 413 after stripping vision payloads (#47339 ) When text compression can't reduce a 413 request further, evict base64 image parts from tool messages and retry once instead of dead-ending with 'Payload too large and cannot compress further.' A 413 is a request-body byte-size limit, not a token limit. browser_vision screenshots (2-5MB base64 each) keep the HTTP body oversized even after aggressive summarization. The strip pass passes remember_model=False so a 413 does not poison _no_list_tool_content_models — that set is for providers that reject list-type tool content, a distinct failure mode. Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current token-aware 413 recovery else branch.	2026-07-01 03:18:41 -07:00
Tyler Merritt	320c587256	fix(context): parse vLLM's token-based output-cap error format vLLM (and other OpenAI-compatible servers) report context overflow with both the window and the prompt in tokens: "This model's maximum context length is 131072 tokens. However, you requested 65536 output tokens and your prompt contains at least 65537 input tokens, for a total of at least 131073 tokens." parse_available_output_tokens_from_error() already classified this as an output-cap error (the "requested N output tokens" gate), but none of the extraction patterns matched the "prompt contains [at least] N input tokens" phrasing, so it returned None. The recovery path then misclassified the failure as prompt-too-long and looped through compression — which frees little while each retry keeps requesting the same oversized max_tokens — terminating in "cannot compress further" even though simply lowering the output cap would have succeeded. Add an extraction branch for the token-based phrasing: available output = window - reported input. When the input alone is at or over the window it still returns None, so the caller correctly falls through to compression. Relates to #43547. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-01 03:17:48 -07:00
DhivinX	49e129e495	fix(anthropic): use claude-code/ UA prefix for OAuth to avoid 404 (#48534 ) Anthropic's OAuth endpoints 404 for the claude-cli/ User-Agent prefix. Switch all three OAuth UA sites (build_anthropic_client, refresh_anthropic_oauth_pure, run_hermes_oauth_login_pure) to the claude-code/ prefix Anthropic expects. Salvaged from #51948. Co-authored-by: DhivinX <20087092+DhivinX@users.noreply.github.com>	2026-07-01 15:42:15 +05:30
fsaad1984	5881791adc	fix(adapter): enforce tool_use/tool_result adjacency in _strip_orphaned_tool_blocks _strip_orphaned_tool_blocks collected tool_result ids across ALL user messages and kept any assistant tool_use whose id appeared anywhere, rather than requiring the result to be in the immediately-following user message. A stale match elsewhere in the transcript could keep a genuinely-orphaned tool_use, which Anthropic rejects. Rewrite to adjacency-checked two-pass logic so a tool_use is kept only when its result immediately follows. Salvaged from #52145. Co-authored-by: fsaad1984 <38867992+fsaad1984@users.noreply.github.com>	2026-07-01 15:42:15 +05:30
Ben Barclay	c71f816956	fix(compression): clear all per-session state in on_session_end, not just _previous_summary The original cross-session contamination fix (#38788) only cleared _previous_summary in on_session_end(), but on_session_reset() clears 14+ per-session variables. When a session ends (cron exit, gateway expiry, session-id rotation) and the compressor instance is reused, the surviving stale state causes: - _ineffective_compression_count surviving → next session skips compression prematurely (anti-thrashing guard misfires) - _summary_failure_cooldown_until surviving → next session blocks summary generation for an unrelated transient error - _last_compress_aborted surviving → callers think compression is still aborted - _last_aux_model_failure_* surviving → stale error warnings shown - _last_summary_dropped_count / _last_summary_fallback_used surviving → misleading user warnings - _context_probed / _context_probe_persistable surviving → stale context-probe state Also fix on_session_reset() which was missing _last_compress_aborted clearing — a /new or /reset would inherit the aborted flag from the prior conversation. Add 6 targeted tests covering the leak vectors and a parity test ensuring on_session_end and on_session_reset always clear the same surface.	2026-07-01 02:48:32 -07:00
ArthurZhang	fdb9620ac4	security(agent): redact Slack App-Level (xapp-) tokens The xapp-<num>-<hash> format used by Slack App-Level / Socket Mode tokens was missing from both agent/redact.py prefix patterns and gateway/run.py gateway secret patterns, so SLACK_APP_TOKEN values could leak through to chat users even with security.redact_secrets enabled. Adds an anchored xapp-\d+- pattern to both redaction paths.	2026-07-01 02:45:22 -07:00
Teknium	da6d5fcd13	fix(auth): serialize Codex OAuth pool refresh under the auth-store lock (#56233 ) The credential-pool Codex refresh path synced tokens from auth.json and then POSTed the refresh_token to OpenAI's token endpoint without holding the cross-process auth-store lock across the whole read->POST->write-back sequence. Because Codex refresh tokens are single-use, two concurrent Hermes processes could both adopt the same on-disk token and both POST it; the loser got refresh_token_reused / invalid_grant. Wrap the Codex OAuth branch of _refresh_entry in the existing shared _auth_store_lock (reentrant, cross-process flock) using the same extended-timeout pattern resolve_codex_runtime_credentials() already uses. A waiting process now blocks on the lock and, once inside, the in-lock re-sync picks up the rotated token the winner persisted and skips its own POST. Also send User-Agent: hermes-cli/<version> on the refresh request. Credit @cooper-oai (#34820) for identifying the concurrent-refresh reuse race; this ships the narrow lock-serialization fix without the separate Codex auth-store partition.	2026-07-01 02:45:07 -07:00
sprmn24	88d6e833f1	fix(agent): wrap list-type untrusted content in untrusted_tool_result _maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a high-risk tool (web_extract, browser_*) returns a multimodal content list ([{type:text},{type:image_url}]) — which _tool_result_content_for _active_model() produces by unwrapping the _multimodal envelope for vision-capable providers — the text part reached the model completely unguarded. An attacker page that ships one image bypassed the entire untrusted-data wrapper. Extend the wrapper to handle list content: each {type:text} part is run through the same string-wrapping path (min-char threshold, delimiter neutralization, one well-formed block), image/video parts pass through untouched so the list stays valid for vision adapters. Recursing into the existing string branch means the list path inherits the delimiter defang and the no-forgeable-fast-path hardening from #56172 for free. The outer list is rebuilt (not returned by identity), so callers compare by value.	2026-07-01 02:44:09 -07:00
mrparker0980	10a54ccc2c	fix(security): anchor @file context refs to canonical read deny-list `@file` / `@folder` context-reference expansion enforced its own narrow deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`) that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`, and `skills/.hub`. It never blocked the credential stores that the canonical read guard (`agent/file_safety.get_read_block_error`) protects: provider API keys (`~/.hermes/auth.json`), Anthropic OAuth tokens (`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`), webhook HMAC secrets, and project-local `.env` files. This matters because the messaging gateway feeds untrusted remote text straight into reference expansion: `gateway/run.py` calls `preprocess_context_references_async(..., allowed_root=_msg_cwd)` where `_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass the `allowed_root` check (it resolves under HOME), slip past the narrow list, and have the operator's live keys read into the agent's context — where the model would typically echo or act on them. Rather than duplicate and re-sync a second secret list, this routes the guard through the existing single source of truth. A reviewer might ask "why not just add `auth.json` to the local list?" — because the local list has already drifted once (a prior commit had to add `.config/gh`); anchoring to `get_read_block_error` means every future addition there protects this path too. The narrow checks are kept as a fallback since they also cover dirs that guard does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped so it can never crash reference expansion. N/A - [x] 🔒 Security fix - `agent/context_references.py`: `_ensure_reference_path_allowed` now also consults `agent.file_safety.get_read_block_error` after its existing checks and refuses the reference when that canonical guard flags the resolved path. The lookup is wrapped so guard-resolution failures fall back to the explicit checks instead of breaking expansion. - `tests/agent/test_context_references.py`: added `test_blocks_canonical_read_denylist_credential_stores`, asserting that `@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/`, and a project-local `.env` are all refused and their secret bodies never reach the expanded message. - `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release gate). 1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests pass, including the new credential-store case. 2. Regression proof: stash `agent/context_references.py`, run the suite with `-- -k canonical`, and confirm the new test fails (secrets leak into the message) without the fix; restore and confirm it passes. 3. `ruff check agent/context_references.py tests/agent/test_context_references.py` and `python scripts/check-windows-footguns.py agent/context_references.py tests/agent/test_context_references.py` both pass. - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.) - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only* changes related to this fix (plus the AUTHOR_MAP release gate) - [x] I've run the test suite for the touched area and all tests pass - [x] I've added tests for my changes (required for bug fixes) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A	2026-07-01 02:43:49 -07:00
kshitijk4poor	22a137ed40	fix(agent): prefer late-completing real result over timeout message (review) Review follow-up on the concurrent-tool deadline salvage. timed_out_indices is snapshotted from not_done at the deadline; a worker can still finish and write results[i] in the window before the post-execution result loop reads it. The loop unconditionally replaced results[i] with a fabricated 'timed out' message for any snapshotted index, discarding a genuinely-successful (just-late) result. Gate the timeout message on 'and r is None' so a real result always wins. Add a regression test that forces the snapshot-vs-result-loop race deterministically (mutation-checked: reverting the guard fails it). Also document the intentional detached-worker leak at the executor abandon site.	2026-07-01 14:56:52 +05:30
Gustavo Mendes	c1784e9093	fix(agent): bound concurrent tool execution with a wall-clock deadline A tool with no internal interrupt check (read_file, web_search, or a wedged terminal backend) that never returns keeps the concurrent-tool poll loop alive forever: the loop only breaks when all futures finish or an interrupt is requested, and the 30s heartbeat resets the gateway idle monitor so idle-kill never fires. The ThreadPoolExecutor was also used as a context manager, so its __exit__ joined the hung worker with wait=True. Add a wall-clock batch deadline (HERMES_CONCURRENT_TOOL_TIMEOUT_S, default 420s — above the 360s web_extract timeout; 0/negative disables). When it fires: cancel pending futures, signal an interrupt to the worker threads, abandon the executor (shutdown wait=False, cancel_futures=True) so hung threads aren't joined, and return a per-tool 'timed out' result for the unfinished calls while still surfacing the finished ones. Also fixes the latent futures.index(f) lookup (ambiguous with duplicate futures) by tracking a future->index map. Salvaged from #54562. Co-authored-by: Gustavo Mendes <87918773+gustavosmendes@users.noreply.github.com>	2026-07-01 14:56:52 +05:30
Teknium	913e661a09	fix(cache): stop verification-loop synthetic nudges from persisting (#56194 ) verify_on_stop / pre_verify append a synthetic assistant "done" plus a synthetic user nudge to keep the agent going one more turn before it can claim completion. Both were flagged (_verification_stop_synthetic on the nudge only), but the flags were never registered in _EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding() filter that guards both persistence sinks (SQLite flush + JSON snapshot) let them through. The resumed transcript then inherited loop-only scaffolding, invalidating the prompt-prefix cache on later turns. - add _verification_stop_synthetic and _pre_verify_synthetic to _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use) - flag the blocked attempt assistant message too, not just the nudge, so the whole synthetic pair drops together and persistence does not keep a premature done with the nudge stripped (assistant to assistant adjacency) The API-payload leak claimed in the report is already handled: the chat_completions transport strips every underscore-prefixed message key before the wire, so the marker never reaches strict providers. Reported by patppham.	2026-07-01 02:26:06 -07:00
Teknium	18c61bb8cf	fix(provider): match api.anthropic.com host on fallback api_mode detection Widen the salvaged #32243 fix to the try_activate_fallback path: a custom provider pointed at the native api.anthropic.com host (no /anthropic path suffix, name != anthropic) fell through to chat_completions -> POST /v1/chat/completions -> 404. Match the host the same way determine_api_mode() and _detect_api_mode_for_url() now do. Absorbs #49247.	2026-07-01 02:18:56 -07:00
itenev	f981d47cb0	fix(gateway): prevent Discord disconnects from blocking event loop models_dev.py's fetch uses a synchronous requests.get(timeout=15). Called from the async gateway message handlers, it blocked the event loop for up to 15s, starving Discord heartbeats and causing ClientConnectionResetError disconnects. Adds get_model_context_length_async() which offloads the entire sync resolution chain to a worker thread via asyncio.to_thread(), and switches the two async gateway call sites (_prepare_inbound_message_text, _handle_message_with_agent) to await it. The loop stays responsive; the sync path remains the single source of truth for the cache. Salvaged from PR #22753 by @itenev. Follow-up: dropped the unused fetch_models_dev_async/lookup_models_dev_context_async aiohttp variants from the original PR (dead code with zero callers that had drifted from the sync cache logic) — the to_thread wrapper already runs the sync path off-loop, so they were redundant.	2026-07-01 02:17:35 -07:00
kshitijk4poor	a658f3b28b	fix(security): strip dynamic Hermes secrets from all subprocess spawn env Subprocesses spawned by the terminal tool, execute_code, Docker backend, and the codex app-server could inherit Hermes-internal secrets that the name-based `_HERMES_PROVIDER_ENV_BLOCKLIST` can't enumerate, because they're injected into `os.environ` at runtime under dynamic names: - `AUXILIARY_<TASK>_API_KEY` / `AUXILIARY_<TASK>_BASE_URL` — per-task side-LLM credentials bridged from `config.yaml[auxiliary]` by gateway/run.py and cli.py (vision, web_extract, approval, compression, plugin-registered tasks). Often separate, higher-spend keys plus base URLs pointing at private endpoints. - `GATEWAY_RELAY__SECRET` / `_KEY` / `_TOKEN` — relay-auth material provisioned by gateway/relay. Additionally, agent/transports/codex_app_server.py built its spawn env from a raw `os.environ.copy()`, bypassing the centralized `hermes_subprocess_env()` helper entirely — handing every codex subprocess the full Tier-1 secret set (GH_TOKEN, gateway bot tokens, Modal/Daytona infra tokens, dashboard session token) unfiltered. This is the #29157 sibling spawn-site gap; copilot_acp_client already routes through the helper. Fix — single chokepoint: - Add `_is_hermes_internal_secret(key)` in tools/environments/local.py as the single source of truth for the dynamic secret patterns. Matches AUXILIARY__API_KEY / _BASE_URL and GATEWAY_RELAY__SECRET/_KEY/_TOKEN; leaves non-secret AUXILIARY__PROVIDER/_MODEL and GATEWAY_RELAY routing hints visible. - Wire the predicate into every spawn path unconditionally (ignores skill env_passthrough opt-in AND inherit_credentials — a model-driving CLI never needs these): `_sanitize_subprocess_env` (both loops), `_make_run_env` (foreground), `hermes_subprocess_env` (Tier-1), and the Docker forward filter. - Add the static GATEWAY_RELAY_* names to `_HERMES_PROVIDER_ENV_BLOCKLIST` so the exact-match path catches them independently of the predicate. - Add the GATEWAY_RELAY_ID/_SECRET/_DELIVERY_KEY triplet to `_ALWAYS_STRIP_KEYS` (Tier-1) so it is stripped unconditionally on EVERY spawn surface — including the codex/copilot `inherit_credentials=True` path that skips the Tier-2 blocklist. `_SECRET`/`_DELIVERY_KEY` are already predicate-matched; `_ID` has no secret suffix, so enumerating it here is what closes its leak on the inherit path (self-review W1). - Defense in depth: env_passthrough.py `_is_hermes_provider_credential()` now consults the same predicate, so a skill can't register these names as passthrough and tunnel them into an execute_code / terminal child. - Route codex_app_server through `hermes_subprocess_env(inherit_credentials=True)` — strips Tier-1 + dynamic-internal secrets while provider creds (which codex needs to authenticate) still flow. Consolidates PRs #53715 (necoweb3 — the _is_hermes_internal_secret backbone + Docker filter), #53503 (srojk34 — env_passthrough guard), and #55709 (srojk34 — codex routing). Retires #52348 (claudlos): its copilot half is already on main, and its codex half used the full-strip `_sanitize_subprocess_env` which would break codex provider auth — the correct tier is `inherit_credentials=True`. Tests: TestHermesInternalDynamicSecrets (terminal + predicate + passthrough override), TestInternalDynamicSecrets (hermes_subprocess_env both tiers), TestSpawnEnvSecretStripping (codex spawn env), plus env_passthrough defense-in-depth cases. Co-authored-by: necoweb3 <sswdarius@gmail.com> Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com> Co-authored-by: claudlos <claudlos@agentmail.to>	2026-07-01 14:37:22 +05:30
Omar Baradei	053424c486	fix(agent): preserve final_response on failure returns AIAgent.run_conversation() promises a dict with final_response, but 16 terminal-failure branches returned dicts that either omitted the key or set it to None. Callers that index result['final_response'] directly (run_agent.py chat() + the __main__ printer) turn a real provider/context failure into an opaque KeyError instead of surfacing the actionable error. Every offending branch already carried usable 'error' text, so this mirrors that text into final_response for all 16 sites (8 that omitted the key, 8 that returned None). Adds an AST regression test that fails if any run_conversation() dict return omits final_response or sets it to a literal None, and tightens the invalid-response test to assert final_response == error.	2026-07-01 02:04:28 -07:00
qWaitCrypto	e1ff736f26	fix(anthropic): preserve ordered replay cache markers	2026-07-01 02:03:40 -07:00
qWaitCrypto	80d71e8d2e	fix(anthropic): preserve tool use cache markers	2026-07-01 02:03:40 -07:00
Jeff Watts	a2d6f05d1b	fix(moa): append reference block at end of aggregator prompt for KV-cache reuse The MoA aggregator received the per-turn reference block merged into the most recent `user` message. In an agentic tool loop that message is the original task near the top of the context (everything after it is assistant/tool turns), so injecting text that changes every iteration diverges the prompt prefix early. The server's KV cache then cannot be reused and the entire conversation re-prefills on every tool-loop step — full prefill each step, which dominates latency on long contexts. Append the reference block at the end of the prompt instead (merging into the last message only when it is already a trailing user turn, i.e. plain chat). This keeps the [system][task][tool-history] prefix stable and cache-reusable so only the new block re-prefills, and gives the aggregator the references with recency. Extracted as `_attach_reference_guidance` with unit tests. Measured on a local llama.cpp aggregator over a long agentic task: KV-cache reuse on follow-up steps went from ~0.3% to ~93-95% and per-step prefill on an ~80k-token context dropped from ~44s to <1s, with no change to output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-01 01:59:00 -07:00
sasquatch9818	020d263ef6	fix(agent): defang untrusted-tool-result delimiter against tag injection `_maybe_wrap_untrusted` is the architectural defense against indirect prompt injection. It wraps attacker-controllable tool output (web_extract, web_search, browser_, mcp_) in `<untrusted_tool_result>...</untrusted_tool_result>` so the model treats it as data. The content was interpolated verbatim, so the boundary was forgeable. Two holes. A poisoned page that embeds `</untrusted_tool_result>` closes the block early — everything after it reads as trusted instructions. And the `startswith("<untrusted_tool_result")` re-entrancy guard returned content that merely started with the opening tag completely unwrapped, so an attacker just prefixed the tag to drop all data framing. Fix neutralizes any embedded delimiter token (case-insensitive) before interpolation and drops the forgeable fast-path, so content is always sealed in exactly one well-formed block. Re-wrapping an already-wrapped forward is harmless — it stays framed as data. ## What does this PR do? Closes an indirect prompt-injection bypass in the untrusted-tool-result wrapper. Attacker content can no longer break out of, or forge, the trust boundary. ## Related Issue N/A ## Type of Change - [x] 🔒 Security fix ## Changes Made - `agent/tool_dispatch_helpers.py`: add `_neutralize_delimiters` (case-insensitive defang of the `untrusted_tool_result` token); `_maybe_wrap_untrusted` now always neutralizes then wraps, and the forgeable `startswith` re-entrancy guard is removed. - `tests/agent/test_tool_dispatch_helpers.py`: replace the double-wrap test (it encoded the bypass) with regression tests for embedded closing tag, leading opening tag, and a cased closing tag. ## How to Test 1. `scripts/run_tests.sh tests/agent/test_tool_dispatch_helpers.py` — 29 pass. 2. Embedded `</untrusted_tool_result>` mid-content: real closing delimiter appears once, at the end; payload trapped inside. 3. Content starting with the opening tag: data framing is applied, not skipped. ## Checklist ### Code - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only changes related to this fix - [x] I've run the affected tests and they pass - [x] I've added tests for my changes - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (docstrings) — or N/A - [x] cli-config.yaml.example — N/A - [x] CONTRIBUTING.md / AGENTS.md — N/A - [x] Cross-platform impact — N/A (pure-Python, stdlib `re`) - [x] Tool descriptions/schemas — N/A	2026-07-01 01:54:45 -07:00
liuhao1024	8f4d195d5f	fix(compressor): pin summary role to user when only system prompt is protected (#52160 ) After the first compaction protect_first_n decays, so on a later compaction the only protected head message can be the system prompt. Adapters like Anthropic and Bedrock send the system prompt as a separate parameter, so the summary becomes the first message in messages[] — and Anthropic rejects any request whose first message is not role=user (HTTP 400). Pin the summary to role=user when the head is system-only, and stop the collision-flip logic from reverting it back to assistant. Salvaged from #52167. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
srojk34	82ac7e16b8	fix(compression): preserve network/auth abort flags across cooldown re-entry (#29559 ) compress() eagerly reset _last_summary_auth_failure and _last_summary_network_failure at the top of every call. On a second compress() during the failure cooldown, _generate_summary() returns None from the cooldown early-return WITHOUT re-asserting those flags, so the abort guard saw False and fell through to the destructive static-fallback that drops the middle window — the data-loss #29559/#25585 describe. Stop resetting them eagerly; a successful summary already clears both, so letting them persist across calls is safe and keeps the cooldown abort protection intact. Salvaged from #52056. Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>	2026-07-01 14:24:41 +05:30
liuhao1024	32b23bfb08	fix(compressor): strip orphan tool_calls instead of inserting stubs (#51218 ) _sanitize_tool_pairs inserted stub role="tool" results for orphaned tool_calls. The pre-API repair_message_sequence() tracks known call IDs by tc.get("id") while this sanitizer keys on call_id\|\|id; when they disagree (Codex Responses API: id != call_id) the stubs are silently dropped by the repair pass, re-exposing the original orphans. Strip the orphaned tool_calls at the source instead (preserving any text content, adding a placeholder for an otherwise-empty assistant turn) to avoid the mismatch class entirely. Salvaged from #51225. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
Harish Kukreja	01bf61c865	fix(runtime): honor NOUS_INFERENCE_BASE_URL across pool/explicit/aux paths Upstream #52270 added `_nous_inference_env_override()` but wired it into only `resolve_nous_runtime_credentials`. Three sibling resolution paths still ignored the override, so a self-hosted Nous inference endpoint set via `NOUS_INFERENCE_BASE_URL` was silently dropped whenever credentials arrived through any of them: - the credential-pool path (`_resolve_runtime_from_pool_entry`) - the explicit-provider path (`_resolve_explicit_runtime`) - the auxiliary side-LLM client (`_pool_runtime_base_url`) Route all three through the same auth-layer reader so every `NOUS_INFERENCE_BASE_URL` read shares one normalization path (trailing-slash stripping, blank -> empty) and the documented trusted-bypass intent stays in one place. The override is live-only: it wins for the base URL returned this run but is never persisted to auth.json or the credential pool, so an ephemeral dev/staging value cannot poison durable auth state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 01:52:06 -07:00

1 2 3 4 5 ...

1621 commits