Commit graph

1621 commits

Author SHA1 Message Date
srojk34
7f64cce96d security(vertex): route credential/project/region resolution through the profile secret scope
agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH,
GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw
os.environ.get() instead of the profile-scoped get_secret() every other
credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex
gateway serving several profiles from one process, os.environ still holds
whichever profile's .env python-dotenv loaded at boot — so a raw read here
let one profile's turn silently mint a Vertex OAuth2 token from, and get
billed against, a different profile's GCP service account. No error, no
fail-closed guard: the multiplex UnscopedSecretError protection was bypassed
entirely because these reads never went through get_secret().

- _resolve_credentials_path/_resolve_project_override/_resolve_region now
  call agent.secret_scope.get_secret(), matching the _getenv() pattern
  already used for every other provider's credentials.
- get_vertex_credentials()'s ADC fallback (google.auth.default()) reads
  GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing
  get_secret() entirely — closed with a narrow guard: when multiplexing is
  active and this profile's scope has no Vertex credentials of its own, but
  os.environ still carries a value (left by a different profile's boot-time
  dotenv load), refuse ADC rather than silently authenticate as a stranger.
- Zero behavior change for single-profile installs: get_secret() falls
  through to os.environ transparently whenever multiplexing is off.

Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/
HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential
path.
2026-07-02 06:07:56 +05:30
kshitijk4poor
676236bb1d fix(agent): honor custom CA certs on aux client + harden TLS resolution
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:

- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
  accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
  the main-client TLS resolution (via load_config_readonly, the read-only
  fast path) so compression/vision/web_extract/title-gen/session_search
  honor the same per-provider CA. Without this, chat worked against a
  private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
  instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
  ssl_verify edits are honored on mid-session model switch — matching the
  context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
  (httpx ignores it there); verify lives on the transport. Fix docstrings.
  Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
  agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
  the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
  dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
  entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
  logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
  match, and prefix-bypass rejection.
2026-07-02 04:51:56 +05:30
HexLab98
3a2ba959ce fix(agent): honor custom CA certs for custom_providers HTTPS endpoints
Wire ssl_ca_cert and ssl_verify through custom_providers config and env
vars into the keepalive httpx client, fixing APIConnectionError against
mkcert/self-signed Ollama proxies behind HTTPS.
2026-07-02 04:51:56 +05:30
HexLab98
7e957cbd0b feat(agent): add resolve_httpx_verify for custom CA bundle TLS
Introduce a shared helper that maps HERMES_CA_BUNDLE, SSL_CERT_FILE, and
per-provider ssl_ca_cert settings to httpx verify contexts.
2026-07-02 04:51:56 +05:30
Brooklyn Nicholson
ec319e4e3e fix(learning_graph): guard non-dict metadata so /journey can't crash
parse_frontmatter's malformed-YAML fallback stores every value as a string,
so a skill's `metadata` can be a str. `_category`/`_related` chained
`.get("metadata", {}).get("hermes", {})` and blew up with `'str' object has
no attribute 'get'`, taking down `build_learning_graph()` (and thus /journey
and `hermes journey`) whenever any installed skill had bad frontmatter.

Extract a `_hermes_meta()` helper that returns the nested dict only when it
really is one. Fixes the whole class, not just the two call sites.
2026-07-01 16:25:48 -05:00
kshitijk4poor
b23e1c3077 refactor(approval): extract is_approval_bypass_active(); use frozen-env bypass in codex routing
Self-review follow-up on the salvaged approval-routing fix.

The initial adaptation re-read os.getenv("HERMES_YOLO_MODE") at session-build
time. That diverges from the repo's security invariant: HERMES_YOLO_MODE is
frozen into tools.approval._YOLO_MODE_FROZEN at import time precisely so a skill
running mid-process cannot set the env var and instantly flip the approval
bypass (a prompt-injection escalation path). A live re-read re-opened that hole
for the codex routing path.

- Add tools.approval.is_approval_bypass_active() — the canonical three-source
  bypass check (frozen --yolo/HERMES_YOLO_MODE + session /yolo + approvals.mode
  off) in one place. This is the 4th inline copy of that OR-chain (the three
  sites in approval.py and tui_gateway/server.py:3121 all use the same idiom);
  the helper is the shared chokepoint they can collapse onto.
- codex_runtime.py now calls is_approval_bypass_active() instead of the
  hand-rolled mode-or-session check plus a runtime env re-read.
- Update the env-yolo test to patch _YOLO_MODE_FROZEN (the canonical test
  pattern, e.g. tests/tools/test_yolo_mode.py) rather than setenv, which is
  dead-on-arrival against the frozen constant.

Fail-closed default preserved on every branch; 28 integration + 77 session/yolo
tests pass; E2E confirms the real exec decision flips decline->accept only when
bypass is active.
2026-07-01 22:58:37 +05:30
snav
0b8e81996f fix(codex-app-server): honor approvals.mode/yolo for gateway-context approval routing
On gateway/cron/non-CLI contexts the codex app-server runtime has no UI to
surface codex's exec/apply_patch approval requests, so they fail closed
(silently decline) — the bot appears responsive but cannot write files, with
no approval prompt anywhere ("patch rejected by user").

When the user has explicitly opted out of Hermes approvals (approvals.mode: off,
the /yolo session toggle, or HERMES_YOLO_MODE=1), collapse to codex's own
sandbox permission profile (~/.codex/config.toml) as the policy gate by passing
_ServerRequestRouting(auto_approve_exec=True, auto_approve_apply_patch=True) to
the session. Defaults (manual/smart/unset) preserve the current fail-closed
behavior — a no-op for users who have not opted out.

Reads the mode via the canonical tools.approval._get_approval_mode() (which
already normalizes the YAML-1.1 bare-'off'->False case) at session-build time,
so a mid-session /yolo toggle is honored too.

5 integration tests: each opt-out mechanism (config off, YAML False, env var,
session yolo) plus the default fail-closed regression guard.

Closes #26530

Co-authored-by: snav <jake@nousresearch.com>
2026-07-01 22:58:37 +05:30
Teknium
eae3700b16
fix(moa): raise aux timeouts to 900s and give the Codex aux path a stable prompt_cache_key (#56395)
Two independent MoA auxiliary-call fixes:

#53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout
were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long
reference/aggregator turn (mixed providers, deep reasoning, long tool chains)
has headroom instead of being cut mid-generation.

#53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by
the MoA acting-aggregator, compression, web_extract, session_search, etc.)
never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses
transport (agent/transports/codex.py) was warm. Derive the same
content-addressed key via the shared _content_cache_key(instructions, tools)
helper and set it on the aux Responses request, with the same host guards the
main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out
of cache-key routing).

Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical
prefix, differs on different instructions, skipped for xai/github hosts).
tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py
130 pass.
2026-07-01 06:02:40 -07:00
Teknium
aa605b66c8
fix(moa): price aggregator turn at its real model so session cost isn't advisor-only (#56394)
On the MoA path agent.model/provider are the virtual preset name (e.g.
"closed") and "moa", which have no pricing entry. estimate_usage_cost()
returned None for the aggregator turn, so the `if amount_usd is not None`
guard skipped it and the session's estimated_cost_usd reflected only the
advisor fan-out — a ~50% undercount when the aggregator does the full acting
loop (verified: $0.91 advisor-only vs $1.96 true, aggregator = 54%).

MoAChatCompletions.create() now stashes the resolved aggregator slot as
last_aggregator_slot (exposed via MoAClient); conversation_loop reads it to
price the aggregator turn at its real model/provider. cost_source flips from
'none' to 'provider_models_api'.
2026-07-01 06:02:33 -07:00
kshitijk4poor
b795a45b8d fix(compaction): detect and strip merge-into-tail summaries past the delimiter
Follow-up to the END-MARKER reorder: moving the summary prefix after the
[PRIOR CONTEXT] wrapper meant _is_context_summary_content (prefix-at-start)
no longer recognized a merged-tail summary. That silently broke three
consumers — the last-real-user anchor (would pick the merged summary as a
real user turn, causing active-task loss), the carry-forward summary find,
and the auto-focus skip. _strip_summary_prefix would also carry the wrapper
+ stale tail content forward as the next summary body.

Extract the two delimiter strings into _MERGED_PRIOR_CONTEXT_HEADER /
_MERGED_SUMMARY_DELIMITER constants (writer + detector stay in sync), teach
_is_context_summary_content and _strip_summary_prefix to look past the
delimiter, and add a regression test. Standalone summaries unchanged.
2026-07-01 18:23:01 +05:30
Gromykoss
a1a8a967e1 fix(compaction): place END MARKER last in merge-into-tail summaries
When the compression summary is merged into the first tail message
(the alternation corner case where a standalone summary role would
collide with both head and tail), the old format was
SUMMARY + END_MARKER + OLD_TAIL_CONTENT — so the preserved tail content
appeared AFTER the end marker and the model could read it as a fresh
message to respond to.

Reorder so the END MARKER is always last: old tail content is wrapped in
[PRIOR CONTEXT ...][END OF PRIOR CONTEXT — COMPACTION SUMMARY BELOW]
delimiters, then the summary, then the END MARKER. _append_text_to_content
handles both string and multimodal-list content.

Salvaged from #56372 by @Gromykoss. Only the END-MARKER reorder half is
carried over. The PR's second change (a post-compaction pass that strips
user-role messages before the first summary marker on compression_count>=2)
was dropped: on 2nd+ compactions the protected head decays to system-only
(_effective_protect_first_n -> 0, #11996) so the targeted 'ghost head user'
does not occur, and where the strip does fire it deletes legitimate recent
tail user turns (data loss) and can leave consecutive assistant messages
(role-alternation violation).
2026-07-01 18:23:01 +05:30
Steve Lawton
c73e74386b feat(vertex): add Google Vertex AI provider for Gemini (OAuth2)
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).

- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
  (5-min margin), ADC->service-account fallback, global vs regional
  endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
  reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
  credentials-file path is never mistaken for a static API key; mints a
  fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
  re-mints the token and rebuilds the client on a mid-session 401, so a
  long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
  env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
  google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
  host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
  runtime resolution.

Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
2026-07-01 05:25:33 -07:00
HODLCLONE
6ed2f5d76f fix: make Nous Portal access token resolution resilient
- Track auth store source path on Nous state reads and write rotated
  OAuth refresh tokens back to the same store, preventing stale-token
  replays when Hermes falls back to a global/root auth.json.
- Skip Nous fallback entries locally when no access/refresh token is
  present, suppressing repeated failed resolution attempts within a
  session.
- Sync session model metadata after fallback switches so the gateway
  DB reflects the backend that actually served the latest turn.
2026-07-01 05:06:00 -07:00
ud
c126a99fc1 fix(subdirectory_hints): catch RuntimeError from Path.expanduser()
`pathlib.Path('~user').expanduser()` raises RuntimeError when the
tilde-expansion can't resolve the user (e.g. `~500-700` where the LLM
meant "approximately 500-700" rather than a path). The hint walker's
existing `except (OSError, ValueError):` clauses do not catch
RuntimeError, so it escapes through the tool dispatcher and surfaces
in the conversation loop as a misleading

    Error during OpenAI-compatible API call #N:
    Could not determine home directory.

Reproduced across three unrelated models (openai/gpt-5-mini,
openai/gpt-5.1-codex, deepseek/deepseek-v4-flash) on terminal-tool
commands containing literal tildes in non-path contexts — common in
LLM output ("~500 agencies", "~45,000 CVEs", "~80/hr blended rate").

Reproduction (one-liner):
    >>> from pathlib import Path
    >>> Path("~500-700").expanduser()
    RuntimeError: Could not determine home directory.

Fix: extend the three `except` clauses in
agent/subdirectory_hints.py to also catch RuntimeError:

  line 138 (_add_path_candidate's outer catch around the Path().expanduser() call)
  lines 198+202 (_load_hints_for_directory's nested catches around hint_path.relative_to(Path.home()))

Tests: tests/agent/test_subdirectory_hints_tilde.py adds three cases
covering: tilde-as-approximately in heredoc commands, ~unknown_user paths,
and a regression guard that legitimate ~/path expansion still works.
2026-07-01 04:55:15 -07:00
JabberELF
18a9467fca fix(tui): prevent killpg suicide during MCP shutdown
Root cause: gateway spawns LSP servers (jdtls/pyright/yaml-ls) and
slash_worker without start_new_session=True, so they inherit the
gateway process group (= TUI parent PID). When mcp_tool
_snapshot_child_pids() races with these spawns during stdio MCP
server startup, non-MCP children leak into _stdio_pgids with the
TUI parent PGID. shutdown_mcp_servers() then killpg(tui_parent_pid,
SIGTERM), killing the TUI itself.

Evidence: tui_gateway_crash.log shows recurring SIGTERM stacks:
  shutdown_mcp_servers -> _kill_orphaned_mcp_children ->
  _send_signal -> killpg(pgid, sig) -> SIGTERM received

Fix (3 layers):
1. agent/lsp/client.py: add start_new_session=True to LSP server
   spawn so each LSP server gets its own process group/session.
2. tui_gateway/server.py: same fix for slash_worker spawn, the
   symmetric root-cause patch so no gateway direct child shares
   the TUI parent pgid.
3. tools/mcp_tool.py: add _filter_mcp_children() defense-in-depth
   that drops non-MCP children (slash_worker, jdtls/eclipse LSP)
   from the PID delta before they can poison _stdio_pgids.
2026-07-01 04:54:46 -07:00
kshitijk4poor
dc1ea005d9 fix+test(codex): self-persist projected turns; keep agent_persisted=True
Follow-up correcting the salvaged fix's persistence approach to avoid a
duplicate user-message write (verified via E2E — the #860/#42039 bug class
the original diff aimed to avoid).

Root cause: in gateway mode the AIAgent is built WITH a session_db, so the
inbound user turn is already flushed at turn start (turn_context.
_persist_session). The original fix returned agent_persisted=False, making the
gateway re-write the whole new-message slice via append_to_transcript ->
append_message (a raw INSERT with no dedup), duplicating the already-flushed
user turn.

Corrected approach (single writer): run_codex_app_server_turn now flushes its
OWN projected assistant/tool messages via _flush_messages_to_session_db (which
dedups the already-persisted user turn through _DB_PERSISTED_MARKER) and
returns agent_persisted=True so the gateway skips its write. Net result:
session_search/distill see the full codex conversation, each message persisted
exactly once.

Adds regression coverage asserting exactly-once persistence on a real
SessionDB, agent_persisted=True, FTS visibility, and standard-runtime skip-db
behaviour preserved.

Co-authored-by: Lubos Buracinsky <lubos@komfi.health>
2026-07-01 17:08:59 +05:30
Lubos Buracinsky
5558382457 fix(codex): persist app-server turns to session DB (fixes starved recall)
The codex_app_server runtime path (run_codex_app_server_turn in
agent/codex_runtime.py) is an early-return that bypasses
conversation_loop and never calls _flush_messages_to_session_db().

Meanwhile, gateway/run.py sets:

  agent_persisted = self._session_db is not None   # always True

and passes skip_db=agent_persisted to every append_to_transcript call,
assuming the agent self-persisted (correct for the standard runtime,
wrong for codex). The result: codex turn messages are persisted nowhere.
state.db accumulates only session_meta rows; session_search (full-text
search over state.db) and conversation-distill are blind to real gateway
conversations, causing 'the agent has no memory of what we discussed'.

Fix (three-part, all backward-compatible):

1. agent/codex_runtime.py — run_codex_app_server_turn success return
   now includes 'agent_persisted': False, signalling that the codex path
   did NOT self-persist its turn.

2. gateway/run.py — the agent_persisted assignment now reads:

     agent_result.get('agent_persisted', self._session_db is not None)

   For the standard runtime (which does not set the key) the default
   (self._session_db is not None) preserves the existing skip-db
   behaviour so no duplicate-write regression (#860 / #42039) occurs.
   For the codex runtime the flag is False, so the gateway writes the
   new turn's messages to state.db and FTS index.

3. gateway/run.py — the rebuilt result dict (run_agent return, which
   becomes agent_result upstream) now includes agent_persisted passed
   through from result_holder[0], with a safe True default.  Without
   this passthrough the flag set in step 1 was discarded when the result
   was reconstructed, causing agent_result.get('agent_persisted', ...)
   to always see the default True and never write codex turns.
2026-07-01 17:08:59 +05:30
Dutch Dim
154c382d65 fix(gateway): recover from truncated responses 2026-07-01 17:08:50 +05:30
kshitijk4poor
9cf47fef54 fix(auxiliary_client): demote the 2 sibling routing fall-throughs too (review)
Phase 2c review flagged that only 2 of the 4 structurally-identical
resolve_provider_client routing dead-ends were demoted. Complete the bug-class:
also demote+dedup the external-process ('not directly supported') and OAuth
('not directly supported, try auto') fall-throughs, keyed by provider name, so
none of the four dead-ends spam WARNING on a retry loop.

Add direct tests for the unhandled-auth_type and OAuth dedup paths via a
monkeypatched PROVIDER_REGISTRY (the review noted these were unverified).
Mutation-checked: reverting either sibling demotion fails its test.
2026-07-01 17:00:30 +05:30
kshitijk4poor
c0d3ceb17e fix(auxiliary_client): dedup resolve_provider_client fall-through warnings
The two fall-through branches in resolve_provider_client (unknown provider,
unhandled auth_type) logged at WARNING on every retry of a misconfigured
provider, spamming logs during retry loops. Demote both to logger.debug with
per-process dedup: the first occurrence still surfaces (a provider-name typo or
PROVIDER_REGISTRY/auth_type-drift bug is worth seeing once), while identical
repeats are suppressed for the process lifetime.

Salvaged from #56283 (extracting only the stated auxiliary_client fix; the
original PR also bundled ~2800 lines of unrelated changes across 10 other
files, which are dropped).
2026-07-01 17:00:30 +05:30
shawchanshek
3b739b990b fix(title_generator): strip think blocks from LLM output before extracting title
Think-enabled models (MiniMax M2.7, DeepSeek, etc.) emit inline
<think>...</think> reasoning even for simple prompts like title
generation, and the raw XML was leaking into session titles. Route the
title-model response through the canonical strip_think_blocks scrubber
before cleanup so every tag variant — closed pairs, unterminated blocks,
orphan closes, mixed case — is handled, not just a single literal
<think> pair.

- 2 regression tests: closed <think> pair stripped, unterminated block
  at start yields no title.

Salvaged from PR #44126 by @shawchanshek.
2026-07-01 04:18:48 -07:00
shandian64
5126902f1d fix(title): honor configured auxiliary timeout 2026-07-01 16:41:43 +05:30
Teknium
5de65624d1
fix(moa): capture streamed aggregator output into full-turn traces (#56312)
MoA full-turn traces (moa.save_traces) recorded the aggregator's acting
output only on the non-streaming path, where it's captured inline at
call time. On the streaming path — which every hermes chat --query run
and every live gateway/CLI turn takes — the aggregator's raw token
stream is handed to the live consumer, so the trace left output=null and
only pointed at the session-db assistant row. An offline audit of a
benchmark run (HermesBench drives --query) then couldn't see what the
aggregator produced without hand-joining to state.db.

Capture the resolved streamed acting text at trace-flush time (the agent
already holds it in _current_streamed_assistant_text) and fold it into
the trace, so the record is self-contained in both modes. New
output_location value inline_from_stream marks a streamed turn whose text
was captured this way; a genuinely empty acting turn (pure tool call)
still points at the session db, matching state.db exactly.

Touches only the trace side-channel — no change to the acting path,
message history, role alternation, or prompt cache.

- agent/moa_loop.py: consume_and_save_trace(..., aggregator_output_fallback)
  on both the facade and the MoAClient wrapper; prefer inline capture,
  fall back to the resolved streamed text.
- agent/moa_trace.py: embed the fallback; add inline_from_stream location.
- agent/conversation_loop.py: pass _current_streamed_assistant_text at flush.
- tests: 5 cases across streaming / non-streaming / empty-fallback / no-double-write.
2026-07-01 04:07:46 -07:00
arminanton
e2fa509bf3 fix(review): isolate the background-review fork from the canonical session
The forked skill/memory review agent shares the parent's session_id for
prompt-cache warmth. Without isolation it wrote its harness turn ('Review the
conversation above and update the skill library…') plus its curator-mode reply
straight into the user's REAL session in state.db; the next live turn re-read
that injected user message as a standing instruction and the agent 'became' the
curator, refusing the actual task.

Root fix: a _persist_disabled flag on the fork that hard-stops every DB write
and lazy-open path (_flush_messages_to_session_db, _ensure_db_session,
_get_session_db_for_recall) — the review writes only to the skill/memory stores
via its tools. Defense-in-depth: _strip_background_review_harness drops any
stray harness message (and the assistant reply that followed) at load time in
get_messages_as_conversation, so an already-polluted session resumes clean.

Salvaged from #50296.

Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>
2026-07-01 16:21:39 +05:30
pefontana
a04b7024ff fix(error-classifier): route 5xx context-overflow into compression
Local inference servers (llama.cpp/llama-server, vLLM/Ollama behind a
Cloudflare/Tailscale hop) report context overflow with HTTP 500/502/503/529
instead of 400/413. _classify_by_status returned server_error/overloaded and
retried blindly, then dropped the turn with no compaction. Route explicit
_CONTEXT_OVERFLOW_PATTERNS matches on those 5xx codes to context_overflow
(should_compress=True); plain 500 stays server_error, plain 503 overloaded.
2026-07-01 16:14:16 +05:30
WXBR
59e7e9d007 fix(agent): persist recovered final responses
Close a recovery/fallback final_response with an assistant transcript entry before session persistence so durable history cannot end at a tool/user message after the caller receives a final answer.

Adds a regression for a tool-tail transcript with a non-empty final_response. Related to #46071 / #46053, but covers the adjacent case where the assistant message was never appended before persistence.
2026-07-01 03:34:49 -07:00
Tranquil-Flow
122e5bc037 fix(agent): retry 413 after stripping vision payloads (#47339)
When text compression can't reduce a 413 request further, evict base64
image parts from tool messages and retry once instead of dead-ending
with 'Payload too large and cannot compress further.'

A 413 is a request-body byte-size limit, not a token limit. browser_vision
screenshots (2-5MB base64 each) keep the HTTP body oversized even after
aggressive summarization. The strip pass passes remember_model=False so a
413 does not poison _no_list_tool_content_models — that set is for providers
that reject list-type tool content, a distinct failure mode.

Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current
token-aware 413 recovery else branch.
2026-07-01 03:18:41 -07:00
Tyler Merritt
320c587256 fix(context): parse vLLM's token-based output-cap error format
vLLM (and other OpenAI-compatible servers) report context overflow with
both the window and the prompt in tokens:

  "This model's maximum context length is 131072 tokens. However, you
   requested 65536 output tokens and your prompt contains at least 65537
   input tokens, for a total of at least 131073 tokens."

parse_available_output_tokens_from_error() already classified this as an
output-cap error (the "requested N output tokens" gate), but none of the
extraction patterns matched the "prompt contains [at least] N input
tokens" phrasing, so it returned None. The recovery path then
misclassified the failure as prompt-too-long and looped through
compression — which frees little while each retry keeps requesting the
same oversized max_tokens — terminating in "cannot compress further"
even though simply lowering the output cap would have succeeded.

Add an extraction branch for the token-based phrasing: available output
= window - reported input. When the input alone is at or over the
window it still returns None, so the caller correctly falls through to
compression.

Relates to #43547.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 03:17:48 -07:00
DhivinX
49e129e495 fix(anthropic): use claude-code/ UA prefix for OAuth to avoid 404 (#48534)
Anthropic's OAuth endpoints 404 for the claude-cli/ User-Agent prefix. Switch
all three OAuth UA sites (build_anthropic_client, refresh_anthropic_oauth_pure,
run_hermes_oauth_login_pure) to the claude-code/ prefix Anthropic expects.

Salvaged from #51948.

Co-authored-by: DhivinX <20087092+DhivinX@users.noreply.github.com>
2026-07-01 15:42:15 +05:30
fsaad1984
5881791adc fix(adapter): enforce tool_use/tool_result adjacency in _strip_orphaned_tool_blocks
_strip_orphaned_tool_blocks collected tool_result ids across ALL user messages
and kept any assistant tool_use whose id appeared anywhere, rather than
requiring the result to be in the immediately-following user message. A stale
match elsewhere in the transcript could keep a genuinely-orphaned tool_use,
which Anthropic rejects. Rewrite to adjacency-checked two-pass logic so a
tool_use is kept only when its result immediately follows.

Salvaged from #52145.

Co-authored-by: fsaad1984 <38867992+fsaad1984@users.noreply.github.com>
2026-07-01 15:42:15 +05:30
Ben Barclay
c71f816956 fix(compression): clear all per-session state in on_session_end, not just _previous_summary
The original cross-session contamination fix (#38788) only cleared
_previous_summary in on_session_end(), but on_session_reset() clears
14+ per-session variables. When a session ends (cron exit, gateway
expiry, session-id rotation) and the compressor instance is reused,
the surviving stale state causes:

- _ineffective_compression_count surviving → next session skips
  compression prematurely (anti-thrashing guard misfires)
- _summary_failure_cooldown_until surviving → next session blocks
  summary generation for an unrelated transient error
- _last_compress_aborted surviving → callers think compression is
  still aborted
- _last_aux_model_failure_* surviving → stale error warnings shown
- _last_summary_dropped_count / _last_summary_fallback_used
  surviving → misleading user warnings
- _context_probed / _context_probe_persistable surviving → stale
  context-probe state

Also fix on_session_reset() which was missing _last_compress_aborted
clearing — a /new or /reset would inherit the aborted flag from the
prior conversation.

Add 6 targeted tests covering the leak vectors and a parity test
ensuring on_session_end and on_session_reset always clear the same
surface.
2026-07-01 02:48:32 -07:00
ArthurZhang
fdb9620ac4 security(agent): redact Slack App-Level (xapp-) tokens
The xapp-<num>-<hash> format used by Slack App-Level / Socket Mode
tokens was missing from both agent/redact.py prefix patterns and
gateway/run.py gateway secret patterns, so SLACK_APP_TOKEN values could
leak through to chat users even with security.redact_secrets enabled.

Adds an anchored xapp-\d+- pattern to both redaction paths.
2026-07-01 02:45:22 -07:00
Teknium
da6d5fcd13
fix(auth): serialize Codex OAuth pool refresh under the auth-store lock (#56233)
The credential-pool Codex refresh path synced tokens from auth.json and
then POSTed the refresh_token to OpenAI's token endpoint without holding
the cross-process auth-store lock across the whole read->POST->write-back
sequence. Because Codex refresh tokens are single-use, two concurrent
Hermes processes could both adopt the same on-disk token and both POST
it; the loser got refresh_token_reused / invalid_grant.

Wrap the Codex OAuth branch of _refresh_entry in the existing shared
_auth_store_lock (reentrant, cross-process flock) using the same
extended-timeout pattern resolve_codex_runtime_credentials() already
uses. A waiting process now blocks on the lock and, once inside, the
in-lock re-sync picks up the rotated token the winner persisted and
skips its own POST. Also send User-Agent: hermes-cli/<version> on the
refresh request.

Credit @cooper-oai (#34820) for identifying the concurrent-refresh
reuse race; this ships the narrow lock-serialization fix without the
separate Codex auth-store partition.
2026-07-01 02:45:07 -07:00
sprmn24
88d6e833f1 fix(agent): wrap list-type untrusted content in untrusted_tool_result
_maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a
high-risk tool (web_extract, browser_*) returns a multimodal content
list ([{type:text},{type:image_url}]) — which _tool_result_content_for
_active_model() produces by unwrapping the _multimodal envelope for
vision-capable providers — the text part reached the model completely
unguarded. An attacker page that ships one image bypassed the entire
untrusted-data wrapper.

Extend the wrapper to handle list content: each {type:text} part is run
through the same string-wrapping path (min-char threshold, delimiter
neutralization, one well-formed block), image/video parts pass through
untouched so the list stays valid for vision adapters. Recursing into
the existing string branch means the list path inherits the delimiter
defang and the no-forgeable-fast-path hardening from #56172 for free.

The outer list is rebuilt (not returned by identity), so callers compare
by value.
2026-07-01 02:44:09 -07:00
mrparker0980
10a54ccc2c fix(security): anchor @file context refs to canonical read deny-list
`@file` / `@folder` context-reference expansion enforced its own narrow
deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`)
that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`,
and `skills/.hub`. It never blocked the credential stores that the canonical
read guard (`agent/file_safety.get_read_block_error`) protects: provider API
keys (`~/.hermes/auth.json`), Anthropic OAuth tokens
(`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`),
webhook HMAC secrets, and project-local `.env` files.

This matters because the messaging gateway feeds **untrusted** remote text
straight into reference expansion: `gateway/run.py` calls
`preprocess_context_references_async(..., allowed_root=_msg_cwd)` where
`_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat
peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass
the `allowed_root` check (it resolves under HOME), slip past the narrow list,
and have the operator's live keys read into the agent's context — where the
model would typically echo or act on them.

Rather than duplicate and re-sync a second secret list, this routes the guard
through the existing single source of truth. A reviewer might ask "why not just
add `auth.json` to the local list?" — because the local list has already drifted
once (a prior commit had to add `.config/gh`); anchoring to
`get_read_block_error` means every future addition there protects this path too.
The narrow checks are kept as a fallback since they also cover dirs that guard
does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped
so it can never crash reference expansion.

N/A

- [x] 🔒 Security fix

- `agent/context_references.py`: `_ensure_reference_path_allowed` now also
  consults `agent.file_safety.get_read_block_error` after its existing checks
  and refuses the reference when that canonical guard flags the resolved path.
  The lookup is wrapped so guard-resolution failures fall back to the explicit
  checks instead of breaking expansion.
- `tests/agent/test_context_references.py`: added
  `test_blocks_canonical_read_denylist_credential_stores`, asserting that
  `@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/*`, and
  a project-local `.env` are all refused and their secret bodies never reach the
  expanded message.
- `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release
  gate).

1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests
   pass, including the new credential-store case.
2. Regression proof: stash `agent/context_references.py`, run the suite with
   `-- -k canonical`, and confirm the new test fails (secrets leak into the
   message) without the fix; restore and confirm it passes.
3. `ruff check agent/context_references.py tests/agent/test_context_references.py`
   and `python scripts/check-windows-footguns.py agent/context_references.py
   tests/agent/test_context_references.py` both pass.

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (plus the AUTHOR_MAP release gate)
- [x] I've run the test suite for the touched area and all tests pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
2026-07-01 02:43:49 -07:00
kshitijk4poor
22a137ed40 fix(agent): prefer late-completing real result over timeout message (review)
Review follow-up on the concurrent-tool deadline salvage. timed_out_indices is
snapshotted from not_done at the deadline; a worker can still finish and write
results[i] in the window before the post-execution result loop reads it. The
loop unconditionally replaced results[i] with a fabricated 'timed out' message
for any snapshotted index, discarding a genuinely-successful (just-late) result.

Gate the timeout message on 'and r is None' so a real result always wins. Add a
regression test that forces the snapshot-vs-result-loop race deterministically
(mutation-checked: reverting the guard fails it). Also document the intentional
detached-worker leak at the executor abandon site.
2026-07-01 14:56:52 +05:30
Gustavo Mendes
c1784e9093 fix(agent): bound concurrent tool execution with a wall-clock deadline
A tool with no internal interrupt check (read_file, web_search, or a wedged
terminal backend) that never returns keeps the concurrent-tool poll loop alive
forever: the loop only breaks when all futures finish or an interrupt is
requested, and the 30s heartbeat resets the gateway idle monitor so idle-kill
never fires. The ThreadPoolExecutor was also used as a context manager, so its
__exit__ joined the hung worker with wait=True.

Add a wall-clock batch deadline (HERMES_CONCURRENT_TOOL_TIMEOUT_S, default 420s
— above the 360s web_extract timeout; 0/negative disables). When it fires:
cancel pending futures, signal an interrupt to the worker threads, abandon the
executor (shutdown wait=False, cancel_futures=True) so hung threads aren't
joined, and return a per-tool 'timed out' result for the unfinished calls while
still surfacing the finished ones. Also fixes the latent futures.index(f)
lookup (ambiguous with duplicate futures) by tracking a future->index map.

Salvaged from #54562.

Co-authored-by: Gustavo Mendes <87918773+gustavosmendes@users.noreply.github.com>
2026-07-01 14:56:52 +05:30
Teknium
913e661a09
fix(cache): stop verification-loop synthetic nudges from persisting (#56194)
verify_on_stop / pre_verify append a synthetic assistant "done" plus a
synthetic user nudge to keep the agent going one more turn before it can
claim completion. Both were flagged (_verification_stop_synthetic on the
nudge only), but the flags were never registered in
_EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding()
filter that guards both persistence sinks (SQLite flush + JSON snapshot)
let them through. The resumed transcript then inherited loop-only
scaffolding, invalidating the prompt-prefix cache on later turns.

- add _verification_stop_synthetic and _pre_verify_synthetic to
  _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use)
- flag the blocked attempt assistant message too, not just the nudge, so
  the whole synthetic pair drops together and persistence does not keep a
  premature done with the nudge stripped (assistant to assistant adjacency)

The API-payload leak claimed in the report is already handled: the
chat_completions transport strips every underscore-prefixed message key
before the wire, so the marker never reaches strict providers.

Reported by patppham.
2026-07-01 02:26:06 -07:00
Teknium
18c61bb8cf fix(provider): match api.anthropic.com host on fallback api_mode detection
Widen the salvaged #32243 fix to the try_activate_fallback path: a custom
provider pointed at the native api.anthropic.com host (no /anthropic path
suffix, name != anthropic) fell through to chat_completions -> POST
/v1/chat/completions -> 404. Match the host the same way determine_api_mode()
and _detect_api_mode_for_url() now do. Absorbs #49247.
2026-07-01 02:18:56 -07:00
itenev
f981d47cb0 fix(gateway): prevent Discord disconnects from blocking event loop
models_dev.py's fetch uses a synchronous requests.get(timeout=15). Called
from the async gateway message handlers, it blocked the event loop for up
to 15s, starving Discord heartbeats and causing ClientConnectionResetError
disconnects.

Adds get_model_context_length_async() which offloads the entire sync
resolution chain to a worker thread via asyncio.to_thread(), and switches
the two async gateway call sites (_prepare_inbound_message_text,
_handle_message_with_agent) to await it. The loop stays responsive; the
sync path remains the single source of truth for the cache.

Salvaged from PR #22753 by @itenev. Follow-up: dropped the unused
fetch_models_dev_async/lookup_models_dev_context_async aiohttp variants
from the original PR (dead code with zero callers that had drifted from
the sync cache logic) — the to_thread wrapper already runs the sync path
off-loop, so they were redundant.
2026-07-01 02:17:35 -07:00
kshitijk4poor
a658f3b28b fix(security): strip dynamic Hermes secrets from all subprocess spawn env
Subprocesses spawned by the terminal tool, execute_code, Docker backend, and
the codex app-server could inherit Hermes-internal secrets that the name-based
`_HERMES_PROVIDER_ENV_BLOCKLIST` can't enumerate, because they're injected into
`os.environ` at runtime under dynamic names:

- `AUXILIARY_<TASK>_API_KEY` / `AUXILIARY_<TASK>_BASE_URL` — per-task side-LLM
  credentials bridged from `config.yaml[auxiliary]` by gateway/run.py and cli.py
  (vision, web_extract, approval, compression, plugin-registered tasks). Often
  separate, higher-spend keys plus base URLs pointing at private endpoints.
- `GATEWAY_RELAY_*_SECRET` / `_KEY` / `_TOKEN` — relay-auth material provisioned
  by gateway/relay.

Additionally, agent/transports/codex_app_server.py built its spawn env from a
raw `os.environ.copy()`, bypassing the centralized `hermes_subprocess_env()`
helper entirely — handing every codex subprocess the full Tier-1 secret set
(GH_TOKEN, gateway bot tokens, Modal/Daytona infra tokens, dashboard session
token) unfiltered. This is the #29157 sibling spawn-site gap; copilot_acp_client
already routes through the helper.

Fix — single chokepoint:
- Add `_is_hermes_internal_secret(key)` in tools/environments/local.py as the
  single source of truth for the dynamic secret patterns. Matches
  AUXILIARY_*_API_KEY / _BASE_URL and GATEWAY_RELAY_*_SECRET/_KEY/_TOKEN; leaves
  non-secret AUXILIARY_*_PROVIDER/_MODEL and GATEWAY_RELAY routing hints visible.
- Wire the predicate into every spawn path unconditionally (ignores skill
  env_passthrough opt-in AND inherit_credentials — a model-driving CLI never
  needs these): `_sanitize_subprocess_env` (both loops), `_make_run_env`
  (foreground), `hermes_subprocess_env` (Tier-1), and the Docker forward filter.
- Add the static GATEWAY_RELAY_* names to `_HERMES_PROVIDER_ENV_BLOCKLIST` so the
  exact-match path catches them independently of the predicate.
- Add the GATEWAY_RELAY_ID/_SECRET/_DELIVERY_KEY triplet to `_ALWAYS_STRIP_KEYS`
  (Tier-1) so it is stripped unconditionally on EVERY spawn surface — including
  the codex/copilot `inherit_credentials=True` path that skips the Tier-2
  blocklist. `_SECRET`/`_DELIVERY_KEY` are already predicate-matched; `_ID` has
  no secret suffix, so enumerating it here is what closes its leak on the
  inherit path (self-review W1).
- Defense in depth: env_passthrough.py `_is_hermes_provider_credential()` now
  consults the same predicate, so a skill can't register these names as
  passthrough and tunnel them into an execute_code / terminal child.
- Route codex_app_server through `hermes_subprocess_env(inherit_credentials=True)`
  — strips Tier-1 + dynamic-internal secrets while provider creds (which codex
  needs to authenticate) still flow.

Consolidates PRs #53715 (necoweb3 — the _is_hermes_internal_secret backbone +
Docker filter), #53503 (srojk34 — env_passthrough guard), and #55709 (srojk34 —
codex routing). Retires #52348 (claudlos): its copilot half is already on main,
and its codex half used the full-strip `_sanitize_subprocess_env` which would
break codex provider auth — the correct tier is `inherit_credentials=True`.

Tests: TestHermesInternalDynamicSecrets (terminal + predicate + passthrough
override), TestInternalDynamicSecrets (hermes_subprocess_env both tiers),
TestSpawnEnvSecretStripping (codex spawn env), plus env_passthrough
defense-in-depth cases.

Co-authored-by: necoweb3 <sswdarius@gmail.com>
Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>
Co-authored-by: claudlos <claudlos@agentmail.to>
2026-07-01 14:37:22 +05:30
Omar Baradei
053424c486 fix(agent): preserve final_response on failure returns
AIAgent.run_conversation() promises a dict with final_response, but 16
terminal-failure branches returned dicts that either omitted the key or
set it to None. Callers that index result['final_response'] directly
(run_agent.py chat() + the __main__ printer) turn a real provider/context
failure into an opaque KeyError instead of surfacing the actionable error.

Every offending branch already carried usable 'error' text, so this
mirrors that text into final_response for all 16 sites (8 that omitted the
key, 8 that returned None). Adds an AST regression test that fails if any
run_conversation() dict return omits final_response or sets it to a literal
None, and tightens the invalid-response test to assert final_response == error.
2026-07-01 02:04:28 -07:00
qWaitCrypto
e1ff736f26 fix(anthropic): preserve ordered replay cache markers 2026-07-01 02:03:40 -07:00
qWaitCrypto
80d71e8d2e fix(anthropic): preserve tool use cache markers 2026-07-01 02:03:40 -07:00
Jeff Watts
a2d6f05d1b fix(moa): append reference block at end of aggregator prompt for KV-cache reuse
The MoA aggregator received the per-turn reference block merged into the most
recent `user` message. In an agentic tool loop that message is the original
task near the top of the context (everything after it is assistant/tool turns),
so injecting text that changes every iteration diverges the prompt prefix early.
The server's KV cache then cannot be reused and the entire conversation
re-prefills on every tool-loop step — full prefill each step, which dominates
latency on long contexts.

Append the reference block at the end of the prompt instead (merging into the
last message only when it is already a trailing user turn, i.e. plain chat).
This keeps the [system][task][tool-history] prefix stable and cache-reusable so
only the new block re-prefills, and gives the aggregator the references with
recency. Extracted as `_attach_reference_guidance` with unit tests.

Measured on a local llama.cpp aggregator over a long agentic task: KV-cache
reuse on follow-up steps went from ~0.3% to ~93-95% and per-step prefill on an
~80k-token context dropped from ~44s to <1s, with no change to output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 01:59:00 -07:00
sasquatch9818
020d263ef6 fix(agent): defang untrusted-tool-result delimiter against tag injection
`_maybe_wrap_untrusted` is the architectural defense against indirect
prompt injection. It wraps attacker-controllable tool output
(web_extract, web_search, browser_*, mcp_*) in
`<untrusted_tool_result>...</untrusted_tool_result>` so the model treats
it as data. The content was interpolated verbatim, so the boundary was
forgeable.

Two holes. A poisoned page that embeds `</untrusted_tool_result>` closes
the block early — everything after it reads as trusted instructions. And
the `startswith("<untrusted_tool_result")` re-entrancy guard returned
content that merely started with the opening tag completely unwrapped, so
an attacker just prefixed the tag to drop all data framing.

Fix neutralizes any embedded delimiter token (case-insensitive) before
interpolation and drops the forgeable fast-path, so content is always
sealed in exactly one well-formed block. Re-wrapping an already-wrapped
forward is harmless — it stays framed as data.

## What does this PR do?

Closes an indirect prompt-injection bypass in the untrusted-tool-result
wrapper. Attacker content can no longer break out of, or forge, the
trust boundary.

## Related Issue

N/A

## Type of Change

- [x] 🔒 Security fix

## Changes Made

- `agent/tool_dispatch_helpers.py`: add `_neutralize_delimiters` (case-insensitive defang of the `untrusted_tool_result` token); `_maybe_wrap_untrusted` now always neutralizes then wraps, and the forgeable `startswith` re-entrancy guard is removed.
- `tests/agent/test_tool_dispatch_helpers.py`: replace the double-wrap test (it encoded the bypass) with regression tests for embedded closing tag, leading opening tag, and a cased closing tag.

## How to Test

1. `scripts/run_tests.sh tests/agent/test_tool_dispatch_helpers.py` — 29 pass.
2. Embedded `</untrusted_tool_result>` mid-content: real closing delimiter appears once, at the end; payload trapped inside.
3. Content starting with the opening tag: data framing is applied, not skipped.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix
- [x] I've run the affected tests and they pass
- [x] I've added tests for my changes
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (docstrings) — or N/A
- [x] cli-config.yaml.example — N/A
- [x] CONTRIBUTING.md / AGENTS.md — N/A
- [x] Cross-platform impact — N/A (pure-Python, stdlib `re`)
- [x] Tool descriptions/schemas — N/A
2026-07-01 01:54:45 -07:00
liuhao1024
8f4d195d5f fix(compressor): pin summary role to user when only system prompt is protected (#52160)
After the first compaction protect_first_n decays, so on a later compaction
the only protected head message can be the system prompt. Adapters like
Anthropic and Bedrock send the system prompt as a separate parameter, so the
summary becomes the first message in messages[] — and Anthropic rejects any
request whose first message is not role=user (HTTP 400). Pin the summary to
role=user when the head is system-only, and stop the collision-flip logic from
reverting it back to assistant.

Salvaged from #52167.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-07-01 14:24:41 +05:30
srojk34
82ac7e16b8 fix(compression): preserve network/auth abort flags across cooldown re-entry (#29559)
compress() eagerly reset _last_summary_auth_failure and
_last_summary_network_failure at the top of every call. On a second
compress() during the failure cooldown, _generate_summary() returns None from
the cooldown early-return WITHOUT re-asserting those flags, so the abort guard
saw False and fell through to the destructive static-fallback that drops the
middle window — the data-loss #29559/#25585 describe. Stop resetting them
eagerly; a successful summary already clears both, so letting them persist
across calls is safe and keeps the cooldown abort protection intact.

Salvaged from #52056.

Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>
2026-07-01 14:24:41 +05:30
liuhao1024
32b23bfb08 fix(compressor): strip orphan tool_calls instead of inserting stubs (#51218)
_sanitize_tool_pairs inserted stub role="tool" results for orphaned
tool_calls. The pre-API repair_message_sequence() tracks known call IDs by
tc.get("id") while this sanitizer keys on call_id||id; when they disagree
(Codex Responses API: id != call_id) the stubs are silently dropped by the
repair pass, re-exposing the original orphans. Strip the orphaned tool_calls
at the source instead (preserving any text content, adding a placeholder for
an otherwise-empty assistant turn) to avoid the mismatch class entirely.

Salvaged from #51225.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-07-01 14:24:41 +05:30
Harish Kukreja
01bf61c865 fix(runtime): honor NOUS_INFERENCE_BASE_URL across pool/explicit/aux paths
Upstream #52270 added `_nous_inference_env_override()` but wired it into
only `resolve_nous_runtime_credentials`. Three sibling resolution paths
still ignored the override, so a self-hosted Nous inference endpoint set
via `NOUS_INFERENCE_BASE_URL` was silently dropped whenever credentials
arrived through any of them:

- the credential-pool path (`_resolve_runtime_from_pool_entry`)
- the explicit-provider path (`_resolve_explicit_runtime`)
- the auxiliary side-LLM client (`_pool_runtime_base_url`)

Route all three through the same auth-layer reader so every
`NOUS_INFERENCE_BASE_URL` read shares one normalization path
(trailing-slash stripping, blank -> empty) and the documented
trusted-bypass intent stays in one place. The override is live-only: it
wins for the base URL returned this run but is never persisted to
auth.json or the credential pool, so an ephemeral dev/staging value
cannot poison durable auth state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 01:52:06 -07:00