hermes-agent

Author	SHA1	Message	Date
Jneeee	b98baa3039	feat(config): extra HTTP headers for LLM API calls (#3526 salvage) Named providers / custom_providers entries in config.yaml now accept an extra_headers dict scoped to that endpoint — for reverse proxies, API gateways, and custom auth schemes (e.g. Cloudflare Access service tokens). - hermes_cli/config.py: normalize extra_headers on provider entries (_normalize_custom_provider_entry + providers-dict translation), add get_custom_provider_extra_headers / apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on base_url (case/trailing-slash insensitive, no substring bypass — mirrors the TLS helpers) - hermes_cli/runtime_provider.py: surface extra_headers in the resolved runtime for named custom providers (providers dict, legacy custom_providers list, and the credential-pool path) - run_agent.py / agent/agent_init.py: merge per-provider extra_headers onto the OpenAI client default_headers at construction and on every _apply_client_headers_for_base_url re-application (credential swaps, rebuilds), most-specific level wins; OpenAI-wire only (native Anthropic/Bedrock scoped out) - agent/auxiliary_client.py: accept model.extra_headers as an alias of model.default_headers for the global variant - cli-config.yaml.example: documented commented example - Header values are treated as secrets and never logged Salvaged from PR #3526 by @jneeee, reimplemented against current main. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-07-02 05:33:25 -07:00
kshitijk4poor	0a2d4a6eea	docs(codex): clarify stale-floor docstring reflects the 10k gate The helper docstring described the typical ~15-25k gateway payload but read as if that were the trigger range; the floor actually engages above 10k tokens. Clarify the prose to match the gate.	2026-07-02 17:05:05 +05:30
HexLab98	cb1ccc57e6	fix(codex): extend stale timeout for gateway-scale tool payloads Lower the openai-codex stale-timeout floor from 25k to 10k estimated tokens so Telegram/gateway sessions (~20k tools+instructions) are not aborted at the generic 90s cutoff while Codex is still prefilling.	2026-07-02 17:05:05 +05:30
Teknium	3f2a56d1a4	fix(cli): reliable interrupts, bounded exit, and exit feedback (#57000 ) Three CLI reliability fixes: 1. Interrupt reliability: chat() only re-queued the user's interrupt message when the turn result carried interrupted=True. When the agent thread raced past its last interrupt check (or finished) before the interrupt landed, the message was silently dropped — and the stale _interrupt_requested flag left on the agent instantly aborted the NEXT turn. Un-acknowledged interrupt messages are now re-queued as the next turn and the stale flag is cleared (only when the agent thread actually exited). The clarify-race path also parks the message in _pending_input instead of dropping it. 2. Slow exit (5+ min): stdlib ThreadPoolExecutor workers are non-daemon and joined unconditionally by concurrent.futures' atexit hook — even after shutdown(wait=False). One wedged tool worker (abandoned after interrupt/timeout) held the process open forever. Promoted async_delegation's daemon executor to a shared tools/daemon_pool module and adopted it in tool_executor (concurrent tool batches), memory_manager (background sync), delegate_tool (child timeout wrapper + batch fan-out), and skills_hub (source fan-out). Added a 30s exit watchdog (HERMES_EXIT_WATCHDOG_S) armed at _run_cleanup start as a backstop for wedged cleanup steps. 3. Exit jank: after prompt_toolkit tears down the input/status bars the terminal sat silent for the whole cleanup window, looking hung. Print 'Shutting down… (finalizing session)' immediately at exit start. E2E: live PTY interrupt of a foreground 'sleep 120' terminal tool now aborts in ~1s and the typed message runs as the next turn; wedged-worker + wedged-cleanup subprocess exits in 5.8s (watchdog) instead of hanging.	2026-07-02 04:20:43 -07:00
kshitijk4poor	b837f07dcd	fix(agent): route restore custom-pool match through canonical helper Follow-up on the salvaged #56392 guard. The cherry-picked change matched custom:<name> pool entries against the primary by raw base_url string equality, which (a) can't disambiguate two named custom providers sharing one gateway base_url and (b) left a latent bare-"custom" entry bypass. Route the match through get_custom_provider_pool_key(rt[base_url]) compared against the entry's custom:<name> key, mirroring the sibling guard in recover_with_credential_pool. Use CUSTOM_POOL_PREFIX instead of the literal. Add regression tests for the custom same-endpoint (swap) and cross-endpoint (skip) branches, plus the plain-provider fallback-pool case from #56885.	2026-07-02 13:41:53 +05:30
openhands	820a052575	fix(agent): keep primary runtime restore on matching credential pool (#56374 )	2026-07-02 13:41:53 +05:30
Teknium	fb403a3a73	fix(auxiliary): retry transient blips harder + isolate client cache per model (#56889 ) Two related hardening fixes for auxiliary calls (which include MoA reference advisors — a pinned-model path where provider fallback is not a meaningful recovery): 1. Transient-transport retries: the same-provider retry on a connection reset / timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux call a second blip silently loses the call (root of the run2 double-advisor 'Connection error' collapse — a genuine upstream blip). Now retries N times with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3 total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out preserved. 2. Per-model client-cache isolation: _client_cache_key excluded the model, so two concurrent auxiliary calls to the same provider/base_url/key but different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry and could race each other's client lifecycle. Model now participates in the key -> distinct clients, no cross-call races. Same-model reuse unchanged. - agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in _client_cache_key and both call sites. - hermes_cli/config.py: auxiliary.transient_retries default (2). - tests: new retry/isolation tests; updated 2 stale-expectation tests to the corrected behavior (per-model resolve; N-retry escalation). Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.	2026-07-02 01:09:37 -07:00
Teknium	543d305bbb	feat(moa): add reference_max_tokens to cap advisor output and cut turn latency (#56756 ) MoA per-turn latency is dominated by advisor GENERATION: turn wall time correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over 52 turns). Each turn waits for the slowest advisor to finish writing, and advisors were uncapped — writing multi-thousand-token essays the aggregator only needs the gist of. Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature) that caps ADVISOR output only; the acting aggregator is never capped. Default None = uncapped, so existing presets are byte-for-byte unchanged (no regression). Wired through both MoA execution paths (MoAChatCompletions.create and aggregate_moa_context). E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s (~44% faster), final answer identical/correct. - hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens in _normalize_preset/_default_preset/flattened view - agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out - agent/conversation_loop.py: pass reference_max_tokens on the per-turn path - tests + docs	2026-07-02 00:16:35 -07:00
kshitij	88d1d6206f	fix(streaming): handle completed responses with empty/None choices (#55933 ) (#56713 ) * fix(streaming): handle completed responses with empty/None choices The streaming fallback guard added in #55932 recognized a completed response object only when its `choices` was a non-empty list. But an adapter can return a completed response whose `choices` is `None` or an empty list (an error / content-filter / terminal frame) — still a whole, non-iterable response, not a token stream. Those shapes fell through to `for chunk in stream` and crashed with 'types.SimpleNamespace' object is not iterable which is exactly issue #55933 (MoA `openai-codex` aggregator on TUI/Desktop, where a stream consumer forces the streaming path). Broaden the guard to discriminate on the PRESENCE of a `choices` attribute (a genuine provider Stream object exposes none), disable streaming for the session, and return the completed object so the outer loop's normal invalid-response validation handles empty/None choices via its retry path instead of iterating. Based on the diagnosis in #56525 by @spiky02plateau (that PR normalized the MoA aggregator return with a one-shot chunk iterator; the common text/tool-call crash was already fixed at this seam by #55932, so this extends the existing guard to cover only the remaining empty/None-choices gap). Fixes #55933 * refactor(streaming): simplify empty-choices guard body and parametrize tests Post-review cleanup (no behavior change): - Inline the single-use `response_choices` local and drop the redundant `if first_choice is not None else None` guard (getattr(None, ...) already returns the default safely). - Collapse the two near-identical empty/None-choices regression tests into one `@pytest.mark.parametrize` case. Mutation-verified: reverting the guard to the old non-empty-list condition still makes both parametrized cases fail with the historical 'types.SimpleNamespace' object is not iterable. --------- Co-authored-by: spiky02plateau <155588579+spiky02plateau@users.noreply.github.com>	2026-07-02 06:36:20 +05:30
helix4u	7951250947	fix(moa): lift hidden Anthropic aux output cap	2026-07-02 06:31:18 +05:30
srojk34	7f64cce96d	security(vertex): route credential/project/region resolution through the profile secret scope agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH, GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw os.environ.get() instead of the profile-scoped get_secret() every other credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex gateway serving several profiles from one process, os.environ still holds whichever profile's .env python-dotenv loaded at boot — so a raw read here let one profile's turn silently mint a Vertex OAuth2 token from, and get billed against, a different profile's GCP service account. No error, no fail-closed guard: the multiplex UnscopedSecretError protection was bypassed entirely because these reads never went through get_secret(). - _resolve_credentials_path/_resolve_project_override/_resolve_region now call agent.secret_scope.get_secret(), matching the _getenv() pattern already used for every other provider's credentials. - get_vertex_credentials()'s ADC fallback (google.auth.default()) reads GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing get_secret() entirely — closed with a narrow guard: when multiplexing is active and this profile's scope has no Vertex credentials of its own, but os.environ still carries a value (left by a different profile's boot-time dotenv load), refuse ADC rather than silently authenticate as a stranger. - Zero behavior change for single-profile installs: get_secret() falls through to os.environ transparently whenever multiplexing is off. Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/ HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential path.	2026-07-02 06:07:56 +05:30
kshitijk4poor	676236bb1d	fix(agent): honor custom CA certs on aux client + harden TLS resolution The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up: - Auxiliary client parity: process_bootstrap.build_keepalive_http_client accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors the main-client TLS resolution (via load_config_readonly, the read-only fast path) so compression/vision/web_extract/title-gen/session_search honor the same per-provider CA. Without this, chat worked against a private-CA endpoint but every auxiliary call still failed APIConnectionError. - switch_model now reads custom_providers from live config (load_config_readonly) instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert / ssl_verify edits are honored on mid-session model switch — matching the context-length reload (#15779). - Drop the dead client-level verify= where a custom httpx transport is used (httpx ignores it there); verify lives on the transport. Fix docstrings. Applies to both run_agent._build_keepalive_http_client and process_bootstrap. - resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming the endpoint whenever ssl_verify:false disables verification. - get_custom_provider_tls_settings: case-insensitive base_url match (config dedup already lowercases; scheme/host are case-insensitive) so a mixed-case entry doesn't silently drop its CA. Exact match preserved — no prefix bypass. - Demote best-effort except Exception: pass in agent_init/switch_model to logger.debug(exc_info=True). - Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive match, and prefix-bypass rejection.	2026-07-02 04:51:56 +05:30
HexLab98	3a2ba959ce	fix(agent): honor custom CA certs for custom_providers HTTPS endpoints Wire ssl_ca_cert and ssl_verify through custom_providers config and env vars into the keepalive httpx client, fixing APIConnectionError against mkcert/self-signed Ollama proxies behind HTTPS.	2026-07-02 04:51:56 +05:30
HexLab98	7e957cbd0b	feat(agent): add resolve_httpx_verify for custom CA bundle TLS Introduce a shared helper that maps HERMES_CA_BUNDLE, SSL_CERT_FILE, and per-provider ssl_ca_cert settings to httpx verify contexts.	2026-07-02 04:51:56 +05:30
Brooklyn Nicholson	ec319e4e3e	fix(learning_graph): guard non-dict metadata so /journey can't crash parse_frontmatter's malformed-YAML fallback stores every value as a string, so a skill's `metadata` can be a str. `_category`/`_related` chained `.get("metadata", {}).get("hermes", {})` and blew up with `'str' object has no attribute 'get'`, taking down `build_learning_graph()` (and thus /journey and `hermes journey`) whenever any installed skill had bad frontmatter. Extract a `_hermes_meta()` helper that returns the nested dict only when it really is one. Fixes the whole class, not just the two call sites.	2026-07-01 16:25:48 -05:00
kshitijk4poor	b23e1c3077	refactor(approval): extract is_approval_bypass_active(); use frozen-env bypass in codex routing Self-review follow-up on the salvaged approval-routing fix. The initial adaptation re-read os.getenv("HERMES_YOLO_MODE") at session-build time. That diverges from the repo's security invariant: HERMES_YOLO_MODE is frozen into tools.approval._YOLO_MODE_FROZEN at import time precisely so a skill running mid-process cannot set the env var and instantly flip the approval bypass (a prompt-injection escalation path). A live re-read re-opened that hole for the codex routing path. - Add tools.approval.is_approval_bypass_active() — the canonical three-source bypass check (frozen --yolo/HERMES_YOLO_MODE + session /yolo + approvals.mode off) in one place. This is the 4th inline copy of that OR-chain (the three sites in approval.py and tui_gateway/server.py:3121 all use the same idiom); the helper is the shared chokepoint they can collapse onto. - codex_runtime.py now calls is_approval_bypass_active() instead of the hand-rolled mode-or-session check plus a runtime env re-read. - Update the env-yolo test to patch _YOLO_MODE_FROZEN (the canonical test pattern, e.g. tests/tools/test_yolo_mode.py) rather than setenv, which is dead-on-arrival against the frozen constant. Fail-closed default preserved on every branch; 28 integration + 77 session/yolo tests pass; E2E confirms the real exec decision flips decline->accept only when bypass is active.	2026-07-01 22:58:37 +05:30
snav	0b8e81996f	fix(codex-app-server): honor approvals.mode/yolo for gateway-context approval routing On gateway/cron/non-CLI contexts the codex app-server runtime has no UI to surface codex's exec/apply_patch approval requests, so they fail closed (silently decline) — the bot appears responsive but cannot write files, with no approval prompt anywhere ("patch rejected by user"). When the user has explicitly opted out of Hermes approvals (approvals.mode: off, the /yolo session toggle, or HERMES_YOLO_MODE=1), collapse to codex's own sandbox permission profile (~/.codex/config.toml) as the policy gate by passing _ServerRequestRouting(auto_approve_exec=True, auto_approve_apply_patch=True) to the session. Defaults (manual/smart/unset) preserve the current fail-closed behavior — a no-op for users who have not opted out. Reads the mode via the canonical tools.approval._get_approval_mode() (which already normalizes the YAML-1.1 bare-'off'->False case) at session-build time, so a mid-session /yolo toggle is honored too. 5 integration tests: each opt-out mechanism (config off, YAML False, env var, session yolo) plus the default fail-closed regression guard. Closes #26530 Co-authored-by: snav <jake@nousresearch.com>	2026-07-01 22:58:37 +05:30
Teknium	eae3700b16	fix(moa): raise aux timeouts to 900s and give the Codex aux path a stable prompt_cache_key (#56395 ) Two independent MoA auxiliary-call fixes: #53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long reference/aggregator turn (mixed providers, deep reasoning, long tool chains) has headroom instead of being cut mid-generation. #53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by the MoA acting-aggregator, compression, web_extract, session_search, etc.) never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses transport (agent/transports/codex.py) was warm. Derive the same content-addressed key via the shared _content_cache_key(instructions, tools) helper and set it on the aux Responses request, with the same host guards the main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out of cache-key routing). Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical prefix, differs on different instructions, skipped for xai/github hosts). tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py 130 pass.	2026-07-01 06:02:40 -07:00
Teknium	aa605b66c8	fix(moa): price aggregator turn at its real model so session cost isn't advisor-only (#56394 ) On the MoA path agent.model/provider are the virtual preset name (e.g. "closed") and "moa", which have no pricing entry. estimate_usage_cost() returned None for the aggregator turn, so the `if amount_usd is not None` guard skipped it and the session's estimated_cost_usd reflected only the advisor fan-out — a ~50% undercount when the aggregator does the full acting loop (verified: $0.91 advisor-only vs $1.96 true, aggregator = 54%). MoAChatCompletions.create() now stashes the resolved aggregator slot as last_aggregator_slot (exposed via MoAClient); conversation_loop reads it to price the aggregator turn at its real model/provider. cost_source flips from 'none' to 'provider_models_api'.	2026-07-01 06:02:33 -07:00
kshitijk4poor	b795a45b8d	fix(compaction): detect and strip merge-into-tail summaries past the delimiter Follow-up to the END-MARKER reorder: moving the summary prefix after the [PRIOR CONTEXT] wrapper meant _is_context_summary_content (prefix-at-start) no longer recognized a merged-tail summary. That silently broke three consumers — the last-real-user anchor (would pick the merged summary as a real user turn, causing active-task loss), the carry-forward summary find, and the auto-focus skip. _strip_summary_prefix would also carry the wrapper + stale tail content forward as the next summary body. Extract the two delimiter strings into _MERGED_PRIOR_CONTEXT_HEADER / _MERGED_SUMMARY_DELIMITER constants (writer + detector stay in sync), teach _is_context_summary_content and _strip_summary_prefix to look past the delimiter, and add a regression test. Standalone summaries unchanged.	2026-07-01 18:23:01 +05:30
Gromykoss	a1a8a967e1	fix(compaction): place END MARKER last in merge-into-tail summaries When the compression summary is merged into the first tail message (the alternation corner case where a standalone summary role would collide with both head and tail), the old format was SUMMARY + END_MARKER + OLD_TAIL_CONTENT — so the preserved tail content appeared AFTER the end marker and the model could read it as a fresh message to respond to. Reorder so the END MARKER is always last: old tail content is wrapped in [PRIOR CONTEXT ...][END OF PRIOR CONTEXT — COMPACTION SUMMARY BELOW] delimiters, then the summary, then the END MARKER. _append_text_to_content handles both string and multimodal-list content. Salvaged from #56372 by @Gromykoss. Only the END-MARKER reorder half is carried over. The PR's second change (a post-compaction pass that strips user-role messages before the first summary marker on compression_count>=2) was dropped: on 2nd+ compactions the protected head decays to system-only (_effective_protect_first_n -> 0, #11996) so the targeted 'ghost head user' does not occur, and where the strip does fire it deletes legitimate recent tail user turns (data loss) and can leave consecutive assistant messages (role-alternation violation).	2026-07-01 18:23:01 +05:30
Steve Lawton	c73e74386b	feat(vertex): add Google Vertex AI provider for Gemini (OAuth2) Adds Vertex AI as a first-class provider for Gemini models via Vertex's OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2 access tokens (service-account JSON or ADC), not a static API key — the missing piece behind the recurring requests (#13484, #12639, #56259). - agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry (5-min margin), ADC->service-account fallback, global vs regional endpoint URLs. Config precedence: env var > config.yaml > default. - plugins/model-providers/vertex/: provider profile (auth_type=vertex), reuses Gemini's extra_body.google.thinking_config translation. - runtime_provider: vertex short-circuit BEFORE the credential pool so a credentials-file path is never mistaken for a static API key; mints a fresh token + computes base_url per resolve. - run_agent + conversation_loop: _try_refresh_vertex_client_credentials() re-mints the token and rebuilds the client on a mid-session 401, so a long-lived gateway agent survives token expiry (~1h). - auxiliary_client: vertex auth_type branch for side-LLM tasks. - config.yaml: vertex.project_id / vertex.region (non-secret, bridged to env); credential path stays in .env (VERTEX_CREDENTIALS_PATH). - setup wizard + model picker: dedicated _model_flow_vertex; curated google/gemini-* model list; --provider choices. - pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint host auto-maps to the vertex provider (no probe spam). - lazy_deps + pyproject [vertex] extra: google-auth, opt-in only. - docs: guides/google-vertex.md + providers page; tests for adapter + runtime resolution. Salvages and modernizes #8427 by @slawt onto current main: rewired from the legacy PROVIDER_REGISTRY path to the provider-profile architecture, moved non-secret config out of .env into config.yaml, and added the per-turn 401 token-refresh the original lacked.	2026-07-01 05:25:33 -07:00
HODLCLONE	6ed2f5d76f	fix: make Nous Portal access token resolution resilient - Track auth store source path on Nous state reads and write rotated OAuth refresh tokens back to the same store, preventing stale-token replays when Hermes falls back to a global/root auth.json. - Skip Nous fallback entries locally when no access/refresh token is present, suppressing repeated failed resolution attempts within a session. - Sync session model metadata after fallback switches so the gateway DB reflects the backend that actually served the latest turn.	2026-07-01 05:06:00 -07:00
ud	c126a99fc1	fix(subdirectory_hints): catch RuntimeError from Path.expanduser() `pathlib.Path('~user').expanduser()` raises RuntimeError when the tilde-expansion can't resolve the user (e.g. `~500-700` where the LLM meant "approximately 500-700" rather than a path). The hint walker's existing `except (OSError, ValueError):` clauses do not catch RuntimeError, so it escapes through the tool dispatcher and surfaces in the conversation loop as a misleading Error during OpenAI-compatible API call #N: Could not determine home directory. Reproduced across three unrelated models (openai/gpt-5-mini, openai/gpt-5.1-codex, deepseek/deepseek-v4-flash) on terminal-tool commands containing literal tildes in non-path contexts — common in LLM output ("~500 agencies", "~45,000 CVEs", "~80/hr blended rate"). Reproduction (one-liner): >>> from pathlib import Path >>> Path("~500-700").expanduser() RuntimeError: Could not determine home directory. Fix: extend the three `except` clauses in agent/subdirectory_hints.py to also catch RuntimeError: line 138 (_add_path_candidate's outer catch around the Path().expanduser() call) lines 198+202 (_load_hints_for_directory's nested catches around hint_path.relative_to(Path.home())) Tests: tests/agent/test_subdirectory_hints_tilde.py adds three cases covering: tilde-as-approximately in heredoc commands, ~unknown_user paths, and a regression guard that legitimate ~/path expansion still works.	2026-07-01 04:55:15 -07:00
JabberELF	18a9467fca	fix(tui): prevent killpg suicide during MCP shutdown Root cause: gateway spawns LSP servers (jdtls/pyright/yaml-ls) and slash_worker without start_new_session=True, so they inherit the gateway process group (= TUI parent PID). When mcp_tool _snapshot_child_pids() races with these spawns during stdio MCP server startup, non-MCP children leak into _stdio_pgids with the TUI parent PGID. shutdown_mcp_servers() then killpg(tui_parent_pid, SIGTERM), killing the TUI itself. Evidence: tui_gateway_crash.log shows recurring SIGTERM stacks: shutdown_mcp_servers -> _kill_orphaned_mcp_children -> _send_signal -> killpg(pgid, sig) -> SIGTERM received Fix (3 layers): 1. agent/lsp/client.py: add start_new_session=True to LSP server spawn so each LSP server gets its own process group/session. 2. tui_gateway/server.py: same fix for slash_worker spawn, the symmetric root-cause patch so no gateway direct child shares the TUI parent pgid. 3. tools/mcp_tool.py: add _filter_mcp_children() defense-in-depth that drops non-MCP children (slash_worker, jdtls/eclipse LSP) from the PID delta before they can poison _stdio_pgids.	2026-07-01 04:54:46 -07:00
kshitijk4poor	dc1ea005d9	fix+test(codex): self-persist projected turns; keep agent_persisted=True Follow-up correcting the salvaged fix's persistence approach to avoid a duplicate user-message write (verified via E2E — the #860/#42039 bug class the original diff aimed to avoid). Root cause: in gateway mode the AIAgent is built WITH a session_db, so the inbound user turn is already flushed at turn start (turn_context. _persist_session). The original fix returned agent_persisted=False, making the gateway re-write the whole new-message slice via append_to_transcript -> append_message (a raw INSERT with no dedup), duplicating the already-flushed user turn. Corrected approach (single writer): run_codex_app_server_turn now flushes its OWN projected assistant/tool messages via _flush_messages_to_session_db (which dedups the already-persisted user turn through _DB_PERSISTED_MARKER) and returns agent_persisted=True so the gateway skips its write. Net result: session_search/distill see the full codex conversation, each message persisted exactly once. Adds regression coverage asserting exactly-once persistence on a real SessionDB, agent_persisted=True, FTS visibility, and standard-runtime skip-db behaviour preserved. Co-authored-by: Lubos Buracinsky <lubos@komfi.health>	2026-07-01 17:08:59 +05:30
Lubos Buracinsky	5558382457	fix(codex): persist app-server turns to session DB (fixes starved recall) The codex_app_server runtime path (run_codex_app_server_turn in agent/codex_runtime.py) is an early-return that bypasses conversation_loop and never calls _flush_messages_to_session_db(). Meanwhile, gateway/run.py sets: agent_persisted = self._session_db is not None # always True and passes skip_db=agent_persisted to every append_to_transcript call, assuming the agent self-persisted (correct for the standard runtime, wrong for codex). The result: codex turn messages are persisted nowhere. state.db accumulates only session_meta rows; session_search (full-text search over state.db) and conversation-distill are blind to real gateway conversations, causing 'the agent has no memory of what we discussed'. Fix (three-part, all backward-compatible): 1. agent/codex_runtime.py — run_codex_app_server_turn success return now includes 'agent_persisted': False, signalling that the codex path did NOT self-persist its turn. 2. gateway/run.py — the agent_persisted assignment now reads: agent_result.get('agent_persisted', self._session_db is not None) For the standard runtime (which does not set the key) the default (self._session_db is not None) preserves the existing skip-db behaviour so no duplicate-write regression (#860 / #42039) occurs. For the codex runtime the flag is False, so the gateway writes the new turn's messages to state.db and FTS index. 3. gateway/run.py — the rebuilt result dict (run_agent return, which becomes agent_result upstream) now includes agent_persisted passed through from result_holder[0], with a safe True default. Without this passthrough the flag set in step 1 was discarded when the result was reconstructed, causing agent_result.get('agent_persisted', ...) to always see the default True and never write codex turns.	2026-07-01 17:08:59 +05:30
Dutch Dim	154c382d65	fix(gateway): recover from truncated responses	2026-07-01 17:08:50 +05:30
kshitijk4poor	9cf47fef54	fix(auxiliary_client): demote the 2 sibling routing fall-throughs too (review) Phase 2c review flagged that only 2 of the 4 structurally-identical resolve_provider_client routing dead-ends were demoted. Complete the bug-class: also demote+dedup the external-process ('not directly supported') and OAuth ('not directly supported, try auto') fall-throughs, keyed by provider name, so none of the four dead-ends spam WARNING on a retry loop. Add direct tests for the unhandled-auth_type and OAuth dedup paths via a monkeypatched PROVIDER_REGISTRY (the review noted these were unverified). Mutation-checked: reverting either sibling demotion fails its test.	2026-07-01 17:00:30 +05:30
kshitijk4poor	c0d3ceb17e	fix(auxiliary_client): dedup resolve_provider_client fall-through warnings The two fall-through branches in resolve_provider_client (unknown provider, unhandled auth_type) logged at WARNING on every retry of a misconfigured provider, spamming logs during retry loops. Demote both to logger.debug with per-process dedup: the first occurrence still surfaces (a provider-name typo or PROVIDER_REGISTRY/auth_type-drift bug is worth seeing once), while identical repeats are suppressed for the process lifetime. Salvaged from #56283 (extracting only the stated auxiliary_client fix; the original PR also bundled ~2800 lines of unrelated changes across 10 other files, which are dropped).	2026-07-01 17:00:30 +05:30
shawchanshek	3b739b990b	fix(title_generator): strip think blocks from LLM output before extracting title Think-enabled models (MiniMax M2.7, DeepSeek, etc.) emit inline <think>...</think> reasoning even for simple prompts like title generation, and the raw XML was leaking into session titles. Route the title-model response through the canonical strip_think_blocks scrubber before cleanup so every tag variant — closed pairs, unterminated blocks, orphan closes, mixed case — is handled, not just a single literal <think> pair. - 2 regression tests: closed <think> pair stripped, unterminated block at start yields no title. Salvaged from PR #44126 by @shawchanshek.	2026-07-01 04:18:48 -07:00
shandian64	5126902f1d	fix(title): honor configured auxiliary timeout	2026-07-01 16:41:43 +05:30
Teknium	5de65624d1	fix(moa): capture streamed aggregator output into full-turn traces (#56312 ) MoA full-turn traces (moa.save_traces) recorded the aggregator's acting output only on the non-streaming path, where it's captured inline at call time. On the streaming path — which every hermes chat --query run and every live gateway/CLI turn takes — the aggregator's raw token stream is handed to the live consumer, so the trace left output=null and only pointed at the session-db assistant row. An offline audit of a benchmark run (HermesBench drives --query) then couldn't see what the aggregator produced without hand-joining to state.db. Capture the resolved streamed acting text at trace-flush time (the agent already holds it in _current_streamed_assistant_text) and fold it into the trace, so the record is self-contained in both modes. New output_location value inline_from_stream marks a streamed turn whose text was captured this way; a genuinely empty acting turn (pure tool call) still points at the session db, matching state.db exactly. Touches only the trace side-channel — no change to the acting path, message history, role alternation, or prompt cache. - agent/moa_loop.py: consume_and_save_trace(..., aggregator_output_fallback) on both the facade and the MoAClient wrapper; prefer inline capture, fall back to the resolved streamed text. - agent/moa_trace.py: embed the fallback; add inline_from_stream location. - agent/conversation_loop.py: pass _current_streamed_assistant_text at flush. - tests: 5 cases across streaming / non-streaming / empty-fallback / no-double-write.	2026-07-01 04:07:46 -07:00
arminanton	e2fa509bf3	fix(review): isolate the background-review fork from the canonical session The forked skill/memory review agent shares the parent's session_id for prompt-cache warmth. Without isolation it wrote its harness turn ('Review the conversation above and update the skill library…') plus its curator-mode reply straight into the user's REAL session in state.db; the next live turn re-read that injected user message as a standing instruction and the agent 'became' the curator, refusing the actual task. Root fix: a _persist_disabled flag on the fork that hard-stops every DB write and lazy-open path (_flush_messages_to_session_db, _ensure_db_session, _get_session_db_for_recall) — the review writes only to the skill/memory stores via its tools. Defense-in-depth: _strip_background_review_harness drops any stray harness message (and the assistant reply that followed) at load time in get_messages_as_conversation, so an already-polluted session resumes clean. Salvaged from #50296. Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>	2026-07-01 16:21:39 +05:30
pefontana	a04b7024ff	fix(error-classifier): route 5xx context-overflow into compression Local inference servers (llama.cpp/llama-server, vLLM/Ollama behind a Cloudflare/Tailscale hop) report context overflow with HTTP 500/502/503/529 instead of 400/413. _classify_by_status returned server_error/overloaded and retried blindly, then dropped the turn with no compaction. Route explicit _CONTEXT_OVERFLOW_PATTERNS matches on those 5xx codes to context_overflow (should_compress=True); plain 500 stays server_error, plain 503 overloaded.	2026-07-01 16:14:16 +05:30
WXBR	59e7e9d007	fix(agent): persist recovered final responses Close a recovery/fallback final_response with an assistant transcript entry before session persistence so durable history cannot end at a tool/user message after the caller receives a final answer. Adds a regression for a tool-tail transcript with a non-empty final_response. Related to #46071 / #46053, but covers the adjacent case where the assistant message was never appended before persistence.	2026-07-01 03:34:49 -07:00
Tranquil-Flow	122e5bc037	fix(agent): retry 413 after stripping vision payloads (#47339 ) When text compression can't reduce a 413 request further, evict base64 image parts from tool messages and retry once instead of dead-ending with 'Payload too large and cannot compress further.' A 413 is a request-body byte-size limit, not a token limit. browser_vision screenshots (2-5MB base64 each) keep the HTTP body oversized even after aggressive summarization. The strip pass passes remember_model=False so a 413 does not poison _no_list_tool_content_models — that set is for providers that reject list-type tool content, a distinct failure mode. Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current token-aware 413 recovery else branch.	2026-07-01 03:18:41 -07:00
Tyler Merritt	320c587256	fix(context): parse vLLM's token-based output-cap error format vLLM (and other OpenAI-compatible servers) report context overflow with both the window and the prompt in tokens: "This model's maximum context length is 131072 tokens. However, you requested 65536 output tokens and your prompt contains at least 65537 input tokens, for a total of at least 131073 tokens." parse_available_output_tokens_from_error() already classified this as an output-cap error (the "requested N output tokens" gate), but none of the extraction patterns matched the "prompt contains [at least] N input tokens" phrasing, so it returned None. The recovery path then misclassified the failure as prompt-too-long and looped through compression — which frees little while each retry keeps requesting the same oversized max_tokens — terminating in "cannot compress further" even though simply lowering the output cap would have succeeded. Add an extraction branch for the token-based phrasing: available output = window - reported input. When the input alone is at or over the window it still returns None, so the caller correctly falls through to compression. Relates to #43547. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-01 03:17:48 -07:00
DhivinX	49e129e495	fix(anthropic): use claude-code/ UA prefix for OAuth to avoid 404 (#48534 ) Anthropic's OAuth endpoints 404 for the claude-cli/ User-Agent prefix. Switch all three OAuth UA sites (build_anthropic_client, refresh_anthropic_oauth_pure, run_hermes_oauth_login_pure) to the claude-code/ prefix Anthropic expects. Salvaged from #51948. Co-authored-by: DhivinX <20087092+DhivinX@users.noreply.github.com>	2026-07-01 15:42:15 +05:30
fsaad1984	5881791adc	fix(adapter): enforce tool_use/tool_result adjacency in _strip_orphaned_tool_blocks _strip_orphaned_tool_blocks collected tool_result ids across ALL user messages and kept any assistant tool_use whose id appeared anywhere, rather than requiring the result to be in the immediately-following user message. A stale match elsewhere in the transcript could keep a genuinely-orphaned tool_use, which Anthropic rejects. Rewrite to adjacency-checked two-pass logic so a tool_use is kept only when its result immediately follows. Salvaged from #52145. Co-authored-by: fsaad1984 <38867992+fsaad1984@users.noreply.github.com>	2026-07-01 15:42:15 +05:30
Ben Barclay	c71f816956	fix(compression): clear all per-session state in on_session_end, not just _previous_summary The original cross-session contamination fix (#38788) only cleared _previous_summary in on_session_end(), but on_session_reset() clears 14+ per-session variables. When a session ends (cron exit, gateway expiry, session-id rotation) and the compressor instance is reused, the surviving stale state causes: - _ineffective_compression_count surviving → next session skips compression prematurely (anti-thrashing guard misfires) - _summary_failure_cooldown_until surviving → next session blocks summary generation for an unrelated transient error - _last_compress_aborted surviving → callers think compression is still aborted - _last_aux_model_failure_* surviving → stale error warnings shown - _last_summary_dropped_count / _last_summary_fallback_used surviving → misleading user warnings - _context_probed / _context_probe_persistable surviving → stale context-probe state Also fix on_session_reset() which was missing _last_compress_aborted clearing — a /new or /reset would inherit the aborted flag from the prior conversation. Add 6 targeted tests covering the leak vectors and a parity test ensuring on_session_end and on_session_reset always clear the same surface.	2026-07-01 02:48:32 -07:00
ArthurZhang	fdb9620ac4	security(agent): redact Slack App-Level (xapp-) tokens The xapp-<num>-<hash> format used by Slack App-Level / Socket Mode tokens was missing from both agent/redact.py prefix patterns and gateway/run.py gateway secret patterns, so SLACK_APP_TOKEN values could leak through to chat users even with security.redact_secrets enabled. Adds an anchored xapp-\d+- pattern to both redaction paths.	2026-07-01 02:45:22 -07:00
Teknium	da6d5fcd13	fix(auth): serialize Codex OAuth pool refresh under the auth-store lock (#56233 ) The credential-pool Codex refresh path synced tokens from auth.json and then POSTed the refresh_token to OpenAI's token endpoint without holding the cross-process auth-store lock across the whole read->POST->write-back sequence. Because Codex refresh tokens are single-use, two concurrent Hermes processes could both adopt the same on-disk token and both POST it; the loser got refresh_token_reused / invalid_grant. Wrap the Codex OAuth branch of _refresh_entry in the existing shared _auth_store_lock (reentrant, cross-process flock) using the same extended-timeout pattern resolve_codex_runtime_credentials() already uses. A waiting process now blocks on the lock and, once inside, the in-lock re-sync picks up the rotated token the winner persisted and skips its own POST. Also send User-Agent: hermes-cli/<version> on the refresh request. Credit @cooper-oai (#34820) for identifying the concurrent-refresh reuse race; this ships the narrow lock-serialization fix without the separate Codex auth-store partition.	2026-07-01 02:45:07 -07:00
sprmn24	88d6e833f1	fix(agent): wrap list-type untrusted content in untrusted_tool_result _maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a high-risk tool (web_extract, browser_*) returns a multimodal content list ([{type:text},{type:image_url}]) — which _tool_result_content_for _active_model() produces by unwrapping the _multimodal envelope for vision-capable providers — the text part reached the model completely unguarded. An attacker page that ships one image bypassed the entire untrusted-data wrapper. Extend the wrapper to handle list content: each {type:text} part is run through the same string-wrapping path (min-char threshold, delimiter neutralization, one well-formed block), image/video parts pass through untouched so the list stays valid for vision adapters. Recursing into the existing string branch means the list path inherits the delimiter defang and the no-forgeable-fast-path hardening from #56172 for free. The outer list is rebuilt (not returned by identity), so callers compare by value.	2026-07-01 02:44:09 -07:00
mrparker0980	10a54ccc2c	fix(security): anchor @file context refs to canonical read deny-list `@file` / `@folder` context-reference expansion enforced its own narrow deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`) that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`, and `skills/.hub`. It never blocked the credential stores that the canonical read guard (`agent/file_safety.get_read_block_error`) protects: provider API keys (`~/.hermes/auth.json`), Anthropic OAuth tokens (`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`), webhook HMAC secrets, and project-local `.env` files. This matters because the messaging gateway feeds untrusted remote text straight into reference expansion: `gateway/run.py` calls `preprocess_context_references_async(..., allowed_root=_msg_cwd)` where `_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass the `allowed_root` check (it resolves under HOME), slip past the narrow list, and have the operator's live keys read into the agent's context — where the model would typically echo or act on them. Rather than duplicate and re-sync a second secret list, this routes the guard through the existing single source of truth. A reviewer might ask "why not just add `auth.json` to the local list?" — because the local list has already drifted once (a prior commit had to add `.config/gh`); anchoring to `get_read_block_error` means every future addition there protects this path too. The narrow checks are kept as a fallback since they also cover dirs that guard does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped so it can never crash reference expansion. N/A - [x] 🔒 Security fix - `agent/context_references.py`: `_ensure_reference_path_allowed` now also consults `agent.file_safety.get_read_block_error` after its existing checks and refuses the reference when that canonical guard flags the resolved path. The lookup is wrapped so guard-resolution failures fall back to the explicit checks instead of breaking expansion. - `tests/agent/test_context_references.py`: added `test_blocks_canonical_read_denylist_credential_stores`, asserting that `@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/`, and a project-local `.env` are all refused and their secret bodies never reach the expanded message. - `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release gate). 1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests pass, including the new credential-store case. 2. Regression proof: stash `agent/context_references.py`, run the suite with `-- -k canonical`, and confirm the new test fails (secrets leak into the message) without the fix; restore and confirm it passes. 3. `ruff check agent/context_references.py tests/agent/test_context_references.py` and `python scripts/check-windows-footguns.py agent/context_references.py tests/agent/test_context_references.py` both pass. - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.) - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only* changes related to this fix (plus the AUTHOR_MAP release gate) - [x] I've run the test suite for the touched area and all tests pass - [x] I've added tests for my changes (required for bug fixes) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A	2026-07-01 02:43:49 -07:00
kshitijk4poor	22a137ed40	fix(agent): prefer late-completing real result over timeout message (review) Review follow-up on the concurrent-tool deadline salvage. timed_out_indices is snapshotted from not_done at the deadline; a worker can still finish and write results[i] in the window before the post-execution result loop reads it. The loop unconditionally replaced results[i] with a fabricated 'timed out' message for any snapshotted index, discarding a genuinely-successful (just-late) result. Gate the timeout message on 'and r is None' so a real result always wins. Add a regression test that forces the snapshot-vs-result-loop race deterministically (mutation-checked: reverting the guard fails it). Also document the intentional detached-worker leak at the executor abandon site.	2026-07-01 14:56:52 +05:30
Gustavo Mendes	c1784e9093	fix(agent): bound concurrent tool execution with a wall-clock deadline A tool with no internal interrupt check (read_file, web_search, or a wedged terminal backend) that never returns keeps the concurrent-tool poll loop alive forever: the loop only breaks when all futures finish or an interrupt is requested, and the 30s heartbeat resets the gateway idle monitor so idle-kill never fires. The ThreadPoolExecutor was also used as a context manager, so its __exit__ joined the hung worker with wait=True. Add a wall-clock batch deadline (HERMES_CONCURRENT_TOOL_TIMEOUT_S, default 420s — above the 360s web_extract timeout; 0/negative disables). When it fires: cancel pending futures, signal an interrupt to the worker threads, abandon the executor (shutdown wait=False, cancel_futures=True) so hung threads aren't joined, and return a per-tool 'timed out' result for the unfinished calls while still surfacing the finished ones. Also fixes the latent futures.index(f) lookup (ambiguous with duplicate futures) by tracking a future->index map. Salvaged from #54562. Co-authored-by: Gustavo Mendes <87918773+gustavosmendes@users.noreply.github.com>	2026-07-01 14:56:52 +05:30
Teknium	913e661a09	fix(cache): stop verification-loop synthetic nudges from persisting (#56194 ) verify_on_stop / pre_verify append a synthetic assistant "done" plus a synthetic user nudge to keep the agent going one more turn before it can claim completion. Both were flagged (_verification_stop_synthetic on the nudge only), but the flags were never registered in _EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding() filter that guards both persistence sinks (SQLite flush + JSON snapshot) let them through. The resumed transcript then inherited loop-only scaffolding, invalidating the prompt-prefix cache on later turns. - add _verification_stop_synthetic and _pre_verify_synthetic to _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use) - flag the blocked attempt assistant message too, not just the nudge, so the whole synthetic pair drops together and persistence does not keep a premature done with the nudge stripped (assistant to assistant adjacency) The API-payload leak claimed in the report is already handled: the chat_completions transport strips every underscore-prefixed message key before the wire, so the marker never reaches strict providers. Reported by patppham.	2026-07-01 02:26:06 -07:00
Teknium	18c61bb8cf	fix(provider): match api.anthropic.com host on fallback api_mode detection Widen the salvaged #32243 fix to the try_activate_fallback path: a custom provider pointed at the native api.anthropic.com host (no /anthropic path suffix, name != anthropic) fell through to chat_completions -> POST /v1/chat/completions -> 404. Match the host the same way determine_api_mode() and _detect_api_mode_for_url() now do. Absorbs #49247.	2026-07-01 02:18:56 -07:00
itenev	f981d47cb0	fix(gateway): prevent Discord disconnects from blocking event loop models_dev.py's fetch uses a synchronous requests.get(timeout=15). Called from the async gateway message handlers, it blocked the event loop for up to 15s, starving Discord heartbeats and causing ClientConnectionResetError disconnects. Adds get_model_context_length_async() which offloads the entire sync resolution chain to a worker thread via asyncio.to_thread(), and switches the two async gateway call sites (_prepare_inbound_message_text, _handle_message_with_agent) to await it. The loop stays responsive; the sync path remains the single source of truth for the cache. Salvaged from PR #22753 by @itenev. Follow-up: dropped the unused fetch_models_dev_async/lookup_models_dev_context_async aiohttp variants from the original PR (dead code with zero callers that had drifted from the sync cache logic) — the to_thread wrapper already runs the sync path off-loop, so they were redundant.	2026-07-01 02:17:35 -07:00

1 2 3 4 5 ...

1631 commits