hermes-agent

Author	SHA1	Message	Date
Jacky Zeng	25aa626cb4	fix(vision): forward custom-endpoint credentials in vision auto-detect A custom:<name> main provider resolves at runtime to the bare provider id "custom". In the vision auto-detect chain, the main-provider branch called resolve_provider_client("custom", ...) WITHOUT explicit_base_url/api_key, so it returned (None, None) ("no endpoint credentials found") and the whole chain fell through to OpenRouter/Nous. A user on a custom endpoint with no aggregator configured then got "No LLM provider configured for task=vision provider=auto" on every image, even though their main model fully supports vision. Recover the live endpoint that set_runtime_main() records each turn (_RUNTIME_MAIN_BASE_URL/_API_KEY/_API_MODE) and forward it to Step 1, with a fallback to _resolve_custom_runtime() for non-gateway callers. Mirrors the existing explicit-base_url branch directly above. Adds TestResolveVisionCustomProvider covering custom, custom:<name>, and the no-runtime fallback path.	2026-07-03 03:54:01 -07:00
Jiahui-Gu	8bf797f1c2	fix(agent): prefer native vision over auxiliary fallback in auto mode (#29135 )	2026-07-03 03:43:35 -07:00
liuhao1024	5e11628546	fix(image_routing): check stripped custom:<name> provider key for vision override When model.provider is set to custom:<name>, _supports_vision_override() previously tried only the runtime provider key ('custom') and the raw config value ('custom:my-proxy'). It did not try the stripped name ('my-proxy'), which is the actual key under providers: in config.yaml. This caused native image routing to fall back to text mode even when the user explicitly declared supports_vision: true on the named provider's model entry. Fixes #39963	2026-07-03 03:33:06 -07:00
kshitijk4poor	e1a1dac848	fix(agent): enforce marker-strip invariant with a single terminal sweep (#57491 ) Follow-up to the per-site strips from the review gate. The two copy-site strips are correct but positional — a copy site added after the assembly loops would re-leak _db_persisted into the child-session flush. Add a single terminal sweep (_strip_persistence_markers) run once on the fully-assembled compressed list so the invariant 'no compacted message leaves compress() carrying a persistence marker' is structural, not dependent on copy-site order. - agent/context_compressor.py: _strip_persistence_markers() called before compress() returns; helper docstring notes the sweep is the authoritative guard - tests/agent/test_context_compressor.py: structural regression — neuter the per-site helper to a leaking copy, assert the terminal sweep still strips - tests/run_agent/test_compression_persistence.py: pin the fixture assumption behind the exact-equality row-count assertion	2026-07-03 12:51:12 +05:30
nankingjing	3e204bd771	fix(agent): strip _db_persisted when assembling rotation compression transcript (#57491 ) Shallow messages[i].copy() during context compression propagated the _db_persisted marker from cached gateway incremental flushes into the post-rotation compressed list. _flush_messages_to_session_db then skipped every row when writing to the new child session, so gateway restarts lost the compacted transcript (severe amnesia). Strip the marker in _fresh_compaction_message_copy() and add regression tests for rotation flush + compressor assembly. Fixes #57491	2026-07-03 12:51:12 +05:30
kshitijk4poor	0950dae2fa	Merge remote-tracking branch 'upstream/main' into HEAD # Conflicts: # scripts/release.py	2026-07-03 03:52:15 +05:30
kshitijk4poor	1c93799b49	fix(agent): self-review follow-ups on vLLM local-context salvage Self-review (ruff+ty lint diff = 0 net-new; 2-agent deep review) surfaced one Warning + comment-accuracy nits; no Critical: - W1: the local-probe TTL cache memoized None (probe failure) for 30s, so a probe that failed during a startup race would suppress a legit retry once the server came up. Cache only positive results — still fully bounds the hot-path probe rate (reachable servers cache their value) while an unreachable one re-probes on the next call. Add a regression test asserting a None result is NOT cached (retry re-probes); mutation-verified. - Tighten the platform-guard comment: gateway/TUI/cron already construct with quiet_mode=True (gated by `not agent.quiet_mode`), so the guard's active job is CLI dedup vs show_banner, not "filling the gateway/TUI gap" as originally worded. Verified not-issues (per review): positive-value 30s cache does not break the reconcile-after-restart freshness contract (restart = fresh process, empty cache); cache key is collision-safe; platform guard is correct in both directions (no runtime path leaves platform None on a non-CLI surface). Tests: 149 passed. ruff clean; ty 0 net-new vs base.	2026-07-03 03:36:22 +05:30
kshitijk4poor	b9a197ec59	fix(agent): resolve review findings on vLLM local-context salvage Salvage review of #56431 surfaced one Critical + two Warning issues; fix them on top of the contributor's cherry-picked commits: 1. Critical — duplicate non-agentic warning on the interactive CLI. The new agent_init warning fires on every platform, but cli.py show_banner() already warns on CLI (richer output + /model hint), so a CLI user saw the warning twice per startup. Guard the agent_init emit to skip platform=="cli" — it now fills exactly the gateway/TUI gap the PR intended, no duplication. 2. Warning — vLLM error-parse regex under-matched. The patterns required a literal space before the number, so "max_model_len: 32768", "=32768", "(32768)", and "... is 32768" all returned None. Broaden both patterns to accept :/=/(/ 'is' delimiters. Add a parametrized test over all delimiter variants. 3. Warning — per-call live probe latency on local endpoints. The new reconcile-on-hit + pre-defaults step-7 probe made every local resolution fire a synchronous network probe (banner + /model switch + compressor update_model each within one startup). Add a 30s in-process TTL cache keyed by (model, base_url) around _query_local_context_length so back-to- back resolutions reuse one round-trip; not persisted to disk, so the reconcile freshness contract (re-probe after restart) is preserved. Add an autouse fixture clearing the cache between tests + TTL coverage. Tests: 148 passed (was 138). ruff clean.	2026-07-03 03:27:13 +05:30
infinitycrew39	53063d92b0	test(agent): cover local vLLM context-length resolution Add regression tests for vLLM max_model_len error parsing, stale local cache reconciliation, live probes over llama defaults, and the 64K minimum guard on persistent cache writes. (cherry picked from commit 1cb47ef437de7ce289cb358e8d6b89e9194b43ed)	2026-07-03 03:22:51 +05:30
Jaaneek	5ef0b8acb0	feat(auth): make xAI Grok OAuth device-code-only, drop loopback login Replace the loopback/PKCE-callback server and manual-paste fallback with the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The flow works in headless/SSH/container sessions with no 127.0.0.1 listener, shrinking the local attack surface. - Poll the token endpoint with server-provided interval, honoring slow_down and expires_in; store tokens with auth_mode oauth_device_code. - Adaptive proactive refresh skew for short-lived device-code JWTs; rotated tokens sync back to auth.json, the global root store, and the credential pool (no refresh-token replay). - Clear source suppression on successful re-login (CLI + dashboard) and drop the duplicate dashboard pool entry so exactly one seeded device_code entry exists. - Use the shared device_code source name for consistency with the nous/codex device-code providers. - Desktop: remove the loopback OAuth flow states and dead type variants; pkce providers' sign-in URL selection is unchanged. - Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted --manual-paste flag from documented commands.	2026-07-02 13:17:41 -07:00
HexLab98	ede4d12561	test(codex): cover gateway-scale stale timeout floor and TTFB gate	2026-07-02 17:05:05 +05:30
Teknium	fb403a3a73	fix(auxiliary): retry transient blips harder + isolate client cache per model (#56889 ) Two related hardening fixes for auxiliary calls (which include MoA reference advisors — a pinned-model path where provider fallback is not a meaningful recovery): 1. Transient-transport retries: the same-provider retry on a connection reset / timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux call a second blip silently loses the call (root of the run2 double-advisor 'Connection error' collapse — a genuine upstream blip). Now retries N times with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3 total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out preserved. 2. Per-model client-cache isolation: _client_cache_key excluded the model, so two concurrent auxiliary calls to the same provider/base_url/key but different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry and could race each other's client lifecycle. Model now participates in the key -> distinct clients, no cross-call races. Same-model reuse unchanged. - agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in _client_cache_key and both call sites. - hermes_cli/config.py: auxiliary.transient_retries default (2). - tests: new retry/isolation tests; updated 2 stale-expectation tests to the corrected behavior (per-model resolve; N-retry escalation). Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.	2026-07-02 01:09:37 -07:00
kshitijk4poor	76be770091	test(moa): assert aux cap against model resolver, not frozen literal Follow-up to the salvaged fix: the regression test asserted a frozen max_tokens == 128_000 literal, coupling it to the Opus-4-8 model table. Assert against _get_anthropic_max_output("claude-opus-4-8") plus > 2000 instead, so the test survives model-table churn while still catching a regression to the old `or 2000` fallback.	2026-07-02 06:31:18 +05:30
helix4u	7951250947	fix(moa): lift hidden Anthropic aux output cap	2026-07-02 06:31:18 +05:30
srojk34	7f64cce96d	security(vertex): route credential/project/region resolution through the profile secret scope agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH, GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw os.environ.get() instead of the profile-scoped get_secret() every other credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex gateway serving several profiles from one process, os.environ still holds whichever profile's .env python-dotenv loaded at boot — so a raw read here let one profile's turn silently mint a Vertex OAuth2 token from, and get billed against, a different profile's GCP service account. No error, no fail-closed guard: the multiplex UnscopedSecretError protection was bypassed entirely because these reads never went through get_secret(). - _resolve_credentials_path/_resolve_project_override/_resolve_region now call agent.secret_scope.get_secret(), matching the _getenv() pattern already used for every other provider's credentials. - get_vertex_credentials()'s ADC fallback (google.auth.default()) reads GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing get_secret() entirely — closed with a narrow guard: when multiplexing is active and this profile's scope has no Vertex credentials of its own, but os.environ still carries a value (left by a different profile's boot-time dotenv load), refuse ADC rather than silently authenticate as a stranger. - Zero behavior change for single-profile installs: get_secret() falls through to os.environ transparently whenever multiplexing is off. Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/ HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential path.	2026-07-02 06:07:56 +05:30
kshitijk4poor	676236bb1d	fix(agent): honor custom CA certs on aux client + harden TLS resolution The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up: - Auxiliary client parity: process_bootstrap.build_keepalive_http_client accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors the main-client TLS resolution (via load_config_readonly, the read-only fast path) so compression/vision/web_extract/title-gen/session_search honor the same per-provider CA. Without this, chat worked against a private-CA endpoint but every auxiliary call still failed APIConnectionError. - switch_model now reads custom_providers from live config (load_config_readonly) instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert / ssl_verify edits are honored on mid-session model switch — matching the context-length reload (#15779). - Drop the dead client-level verify= where a custom httpx transport is used (httpx ignores it there); verify lives on the transport. Fix docstrings. Applies to both run_agent._build_keepalive_http_client and process_bootstrap. - resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming the endpoint whenever ssl_verify:false disables verification. - get_custom_provider_tls_settings: case-insensitive base_url match (config dedup already lowercases; scheme/host are case-insensitive) so a mixed-case entry doesn't silently drop its CA. Exact match preserved — no prefix bypass. - Demote best-effort except Exception: pass in agent_init/switch_model to logger.debug(exc_info=True). - Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive match, and prefix-bypass rejection.	2026-07-02 04:51:56 +05:30
HexLab98	7e957cbd0b	feat(agent): add resolve_httpx_verify for custom CA bundle TLS Introduce a shared helper that maps HERMES_CA_BUNDLE, SSL_CERT_FILE, and per-provider ssl_ca_cert settings to httpx verify contexts.	2026-07-02 04:51:56 +05:30
Brooklyn Nicholson	ec319e4e3e	fix(learning_graph): guard non-dict metadata so /journey can't crash parse_frontmatter's malformed-YAML fallback stores every value as a string, so a skill's `metadata` can be a str. `_category`/`_related` chained `.get("metadata", {}).get("hermes", {})` and blew up with `'str' object has no attribute 'get'`, taking down `build_learning_graph()` (and thus /journey and `hermes journey`) whenever any installed skill had bad frontmatter. Extract a `_hermes_meta()` helper that returns the nested dict only when it really is one. Fixes the whole class, not just the two call sites.	2026-07-01 16:25:48 -05:00
Teknium	eae3700b16	fix(moa): raise aux timeouts to 900s and give the Codex aux path a stable prompt_cache_key (#56395 ) Two independent MoA auxiliary-call fixes: #53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long reference/aggregator turn (mixed providers, deep reasoning, long tool chains) has headroom instead of being cut mid-generation. #53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by the MoA acting-aggregator, compression, web_extract, session_search, etc.) never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses transport (agent/transports/codex.py) was warm. Derive the same content-addressed key via the shared _content_cache_key(instructions, tools) helper and set it on the aux Responses request, with the same host guards the main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out of cache-key routing). Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical prefix, differs on different instructions, skipped for xai/github hosts). tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py 130 pass.	2026-07-01 06:02:40 -07:00
Teknium	aa605b66c8	fix(moa): price aggregator turn at its real model so session cost isn't advisor-only (#56394 ) On the MoA path agent.model/provider are the virtual preset name (e.g. "closed") and "moa", which have no pricing entry. estimate_usage_cost() returned None for the aggregator turn, so the `if amount_usd is not None` guard skipped it and the session's estimated_cost_usd reflected only the advisor fan-out — a ~50% undercount when the aggregator does the full acting loop (verified: $0.91 advisor-only vs $1.96 true, aggregator = 54%). MoAChatCompletions.create() now stashes the resolved aggregator slot as last_aggregator_slot (exposed via MoAClient); conversation_loop reads it to price the aggregator turn at its real model/provider. cost_source flips from 'none' to 'provider_models_api'.	2026-07-01 06:02:33 -07:00
kshitijk4poor	b795a45b8d	fix(compaction): detect and strip merge-into-tail summaries past the delimiter Follow-up to the END-MARKER reorder: moving the summary prefix after the [PRIOR CONTEXT] wrapper meant _is_context_summary_content (prefix-at-start) no longer recognized a merged-tail summary. That silently broke three consumers — the last-real-user anchor (would pick the merged summary as a real user turn, causing active-task loss), the carry-forward summary find, and the auto-focus skip. _strip_summary_prefix would also carry the wrapper + stale tail content forward as the next summary body. Extract the two delimiter strings into _MERGED_PRIOR_CONTEXT_HEADER / _MERGED_SUMMARY_DELIMITER constants (writer + detector stay in sync), teach _is_context_summary_content and _strip_summary_prefix to look past the delimiter, and add a regression test. Standalone summaries unchanged.	2026-07-01 18:23:01 +05:30
Gromykoss	a1a8a967e1	fix(compaction): place END MARKER last in merge-into-tail summaries When the compression summary is merged into the first tail message (the alternation corner case where a standalone summary role would collide with both head and tail), the old format was SUMMARY + END_MARKER + OLD_TAIL_CONTENT — so the preserved tail content appeared AFTER the end marker and the model could read it as a fresh message to respond to. Reorder so the END MARKER is always last: old tail content is wrapped in [PRIOR CONTEXT ...][END OF PRIOR CONTEXT — COMPACTION SUMMARY BELOW] delimiters, then the summary, then the END MARKER. _append_text_to_content handles both string and multimodal-list content. Salvaged from #56372 by @Gromykoss. Only the END-MARKER reorder half is carried over. The PR's second change (a post-compaction pass that strips user-role messages before the first summary marker on compression_count>=2) was dropped: on 2nd+ compactions the protected head decays to system-only (_effective_protect_first_n -> 0, #11996) so the targeted 'ghost head user' does not occur, and where the strip does fire it deletes legitimate recent tail user turns (data loss) and can leave consecutive assistant messages (role-alternation violation).	2026-07-01 18:23:01 +05:30
Steve Lawton	c73e74386b	feat(vertex): add Google Vertex AI provider for Gemini (OAuth2) Adds Vertex AI as a first-class provider for Gemini models via Vertex's OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2 access tokens (service-account JSON or ADC), not a static API key — the missing piece behind the recurring requests (#13484, #12639, #56259). - agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry (5-min margin), ADC->service-account fallback, global vs regional endpoint URLs. Config precedence: env var > config.yaml > default. - plugins/model-providers/vertex/: provider profile (auth_type=vertex), reuses Gemini's extra_body.google.thinking_config translation. - runtime_provider: vertex short-circuit BEFORE the credential pool so a credentials-file path is never mistaken for a static API key; mints a fresh token + computes base_url per resolve. - run_agent + conversation_loop: _try_refresh_vertex_client_credentials() re-mints the token and rebuilds the client on a mid-session 401, so a long-lived gateway agent survives token expiry (~1h). - auxiliary_client: vertex auth_type branch for side-LLM tasks. - config.yaml: vertex.project_id / vertex.region (non-secret, bridged to env); credential path stays in .env (VERTEX_CREDENTIALS_PATH). - setup wizard + model picker: dedicated _model_flow_vertex; curated google/gemini-* model list; --provider choices. - pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint host auto-maps to the vertex provider (no probe spam). - lazy_deps + pyproject [vertex] extra: google-auth, opt-in only. - docs: guides/google-vertex.md + providers page; tests for adapter + runtime resolution. Salvages and modernizes #8427 by @slawt onto current main: rewired from the legacy PROVIDER_REGISTRY path to the provider-profile architecture, moved non-secret config out of .env into config.yaml, and added the per-turn 401 token-refresh the original lacked.	2026-07-01 05:25:33 -07:00
ud	c126a99fc1	fix(subdirectory_hints): catch RuntimeError from Path.expanduser() `pathlib.Path('~user').expanduser()` raises RuntimeError when the tilde-expansion can't resolve the user (e.g. `~500-700` where the LLM meant "approximately 500-700" rather than a path). The hint walker's existing `except (OSError, ValueError):` clauses do not catch RuntimeError, so it escapes through the tool dispatcher and surfaces in the conversation loop as a misleading Error during OpenAI-compatible API call #N: Could not determine home directory. Reproduced across three unrelated models (openai/gpt-5-mini, openai/gpt-5.1-codex, deepseek/deepseek-v4-flash) on terminal-tool commands containing literal tildes in non-path contexts — common in LLM output ("~500 agencies", "~45,000 CVEs", "~80/hr blended rate"). Reproduction (one-liner): >>> from pathlib import Path >>> Path("~500-700").expanduser() RuntimeError: Could not determine home directory. Fix: extend the three `except` clauses in agent/subdirectory_hints.py to also catch RuntimeError: line 138 (_add_path_candidate's outer catch around the Path().expanduser() call) lines 198+202 (_load_hints_for_directory's nested catches around hint_path.relative_to(Path.home())) Tests: tests/agent/test_subdirectory_hints_tilde.py adds three cases covering: tilde-as-approximately in heredoc commands, ~unknown_user paths, and a regression guard that legitimate ~/path expansion still works.	2026-07-01 04:55:15 -07:00
kshitijk4poor	dc1ea005d9	fix+test(codex): self-persist projected turns; keep agent_persisted=True Follow-up correcting the salvaged fix's persistence approach to avoid a duplicate user-message write (verified via E2E — the #860/#42039 bug class the original diff aimed to avoid). Root cause: in gateway mode the AIAgent is built WITH a session_db, so the inbound user turn is already flushed at turn start (turn_context. _persist_session). The original fix returned agent_persisted=False, making the gateway re-write the whole new-message slice via append_to_transcript -> append_message (a raw INSERT with no dedup), duplicating the already-flushed user turn. Corrected approach (single writer): run_codex_app_server_turn now flushes its OWN projected assistant/tool messages via _flush_messages_to_session_db (which dedups the already-persisted user turn through _DB_PERSISTED_MARKER) and returns agent_persisted=True so the gateway skips its write. Net result: session_search/distill see the full codex conversation, each message persisted exactly once. Adds regression coverage asserting exactly-once persistence on a real SessionDB, agent_persisted=True, FTS visibility, and standard-runtime skip-db behaviour preserved. Co-authored-by: Lubos Buracinsky <lubos@komfi.health>	2026-07-01 17:08:59 +05:30
kshitijk4poor	9cf47fef54	fix(auxiliary_client): demote the 2 sibling routing fall-throughs too (review) Phase 2c review flagged that only 2 of the 4 structurally-identical resolve_provider_client routing dead-ends were demoted. Complete the bug-class: also demote+dedup the external-process ('not directly supported') and OAuth ('not directly supported, try auto') fall-throughs, keyed by provider name, so none of the four dead-ends spam WARNING on a retry loop. Add direct tests for the unhandled-auth_type and OAuth dedup paths via a monkeypatched PROVIDER_REGISTRY (the review noted these were unverified). Mutation-checked: reverting either sibling demotion fails its test.	2026-07-01 17:00:30 +05:30
kshitijk4poor	c0d3ceb17e	fix(auxiliary_client): dedup resolve_provider_client fall-through warnings The two fall-through branches in resolve_provider_client (unknown provider, unhandled auth_type) logged at WARNING on every retry of a misconfigured provider, spamming logs during retry loops. Demote both to logger.debug with per-process dedup: the first occurrence still surfaces (a provider-name typo or PROVIDER_REGISTRY/auth_type-drift bug is worth seeing once), while identical repeats are suppressed for the process lifetime. Salvaged from #56283 (extracting only the stated auxiliary_client fix; the original PR also bundled ~2800 lines of unrelated changes across 10 other files, which are dropped).	2026-07-01 17:00:30 +05:30
shawchanshek	3b739b990b	fix(title_generator): strip think blocks from LLM output before extracting title Think-enabled models (MiniMax M2.7, DeepSeek, etc.) emit inline <think>...</think> reasoning even for simple prompts like title generation, and the raw XML was leaking into session titles. Route the title-model response through the canonical strip_think_blocks scrubber before cleanup so every tag variant — closed pairs, unterminated blocks, orphan closes, mixed case — is handled, not just a single literal <think> pair. - 2 regression tests: closed <think> pair stripped, unterminated block at start yields no title. Salvaged from PR #44126 by @shawchanshek.	2026-07-01 04:18:48 -07:00
shandian64	5126902f1d	fix(title): honor configured auxiliary timeout	2026-07-01 16:41:43 +05:30
Teknium	5de65624d1	fix(moa): capture streamed aggregator output into full-turn traces (#56312 ) MoA full-turn traces (moa.save_traces) recorded the aggregator's acting output only on the non-streaming path, where it's captured inline at call time. On the streaming path — which every hermes chat --query run and every live gateway/CLI turn takes — the aggregator's raw token stream is handed to the live consumer, so the trace left output=null and only pointed at the session-db assistant row. An offline audit of a benchmark run (HermesBench drives --query) then couldn't see what the aggregator produced without hand-joining to state.db. Capture the resolved streamed acting text at trace-flush time (the agent already holds it in _current_streamed_assistant_text) and fold it into the trace, so the record is self-contained in both modes. New output_location value inline_from_stream marks a streamed turn whose text was captured this way; a genuinely empty acting turn (pure tool call) still points at the session db, matching state.db exactly. Touches only the trace side-channel — no change to the acting path, message history, role alternation, or prompt cache. - agent/moa_loop.py: consume_and_save_trace(..., aggregator_output_fallback) on both the facade and the MoAClient wrapper; prefer inline capture, fall back to the resolved streamed text. - agent/moa_trace.py: embed the fallback; add inline_from_stream location. - agent/conversation_loop.py: pass _current_streamed_assistant_text at flush. - tests: 5 cases across streaming / non-streaming / empty-fallback / no-double-write.	2026-07-01 04:07:46 -07:00
kshitijk4poor	b7adad1a72	test(error-classifier): parametrize 5xx overflow test over 500/502/503/529 Review nit (helix4u): the fix covers 500/502/503/529 but the positive tests only asserted 500 and 503. Parametrize over all four so 502/529 are covered too; keep the plain-5xx negatives.	2026-07-01 16:14:16 +05:30
pefontana	a04b7024ff	fix(error-classifier): route 5xx context-overflow into compression Local inference servers (llama.cpp/llama-server, vLLM/Ollama behind a Cloudflare/Tailscale hop) report context overflow with HTTP 500/502/503/529 instead of 400/413. _classify_by_status returned server_error/overloaded and retried blindly, then dropped the turn with no compaction. Route explicit _CONTEXT_OVERFLOW_PATTERNS matches on those 5xx codes to context_overflow (should_compress=True); plain 500 stays server_error, plain 503 overloaded.	2026-07-01 16:14:16 +05:30
WXBR	59e7e9d007	fix(agent): persist recovered final responses Close a recovery/fallback final_response with an assistant transcript entry before session persistence so durable history cannot end at a tool/user message after the caller receives a final answer. Adds a regression for a tool-tail transcript with a non-empty final_response. Related to #46071 / #46053, but covers the adjacent case where the assistant message was never appended before persistence.	2026-07-01 03:34:49 -07:00
kshitijk4poor	e3819a4143	test(anthropic): add adjacency behavior test for #52145 + fix vacuous refresh-UA test (review) Review follow-up on the anthropic_adapter batch salvage: 1. #52145 shipped no behavior test for the adjacency rewrite. Add test_strips_tool_use_when_result_not_immediately_adjacent (a tool_use whose result appears later but NOT in the immediately-following user message must be stripped — the exact case the old global id-match got wrong) plus an adjacent-pair control. Mutation-checked: reverting to a global match fails the non-adjacent test. 2. test_token_refresh_ua_prefix was vacuous — it bound to _refresh_oauth_token (a wrapper with no urllib.request.Request), so its assert never ran and it did NOT guard the real refresh UA site. Retarget it at refresh_anthropic_oauth_pure (:1048) with the header-scoped check. Mutation- checked: reverting :1048 to claude-cli/ now fails it.	2026-07-01 15:42:15 +05:30
kshitijk4poor	5efbd7cb05	test(anthropic): scope OAuth-UA source check to header lines, not any mention The salvaged test_token_exchange_ua_prefix did a naive whole-function substring check for 'claude-cli/', which false-positives on an explanatory comment that references the old (blocked) UA. Scope it to actual User-Agent header lines — mirroring the sibling test_no_claude_cli_in_source — so a comment documenting why claude-cli/ is avoided doesn't trip it. Mutation-checked: an actual claude-cli/ UA header still fails the test.	2026-07-01 15:42:15 +05:30
DhivinX	49e129e495	fix(anthropic): use claude-code/ UA prefix for OAuth to avoid 404 (#48534 ) Anthropic's OAuth endpoints 404 for the claude-cli/ User-Agent prefix. Switch all three OAuth UA sites (build_anthropic_client, refresh_anthropic_oauth_pure, run_hermes_oauth_login_pure) to the claude-code/ prefix Anthropic expects. Salvaged from #51948. Co-authored-by: DhivinX <20087092+DhivinX@users.noreply.github.com>	2026-07-01 15:42:15 +05:30
Ben Barclay	c71f816956	fix(compression): clear all per-session state in on_session_end, not just _previous_summary The original cross-session contamination fix (#38788) only cleared _previous_summary in on_session_end(), but on_session_reset() clears 14+ per-session variables. When a session ends (cron exit, gateway expiry, session-id rotation) and the compressor instance is reused, the surviving stale state causes: - _ineffective_compression_count surviving → next session skips compression prematurely (anti-thrashing guard misfires) - _summary_failure_cooldown_until surviving → next session blocks summary generation for an unrelated transient error - _last_compress_aborted surviving → callers think compression is still aborted - _last_aux_model_failure_* surviving → stale error warnings shown - _last_summary_dropped_count / _last_summary_fallback_used surviving → misleading user warnings - _context_probed / _context_probe_persistable surviving → stale context-probe state Also fix on_session_reset() which was missing _last_compress_aborted clearing — a /new or /reset would inherit the aborted flag from the prior conversation. Add 6 targeted tests covering the leak vectors and a parity test ensuring on_session_end and on_session_reset always clear the same surface.	2026-07-01 02:48:32 -07:00
Ruzzgar	e13b6ce1c6	test(redact): cover Slack App-Level (xapp-) token redaction	2026-07-01 02:45:22 -07:00
Teknium	da6d5fcd13	fix(auth): serialize Codex OAuth pool refresh under the auth-store lock (#56233 ) The credential-pool Codex refresh path synced tokens from auth.json and then POSTed the refresh_token to OpenAI's token endpoint without holding the cross-process auth-store lock across the whole read->POST->write-back sequence. Because Codex refresh tokens are single-use, two concurrent Hermes processes could both adopt the same on-disk token and both POST it; the loser got refresh_token_reused / invalid_grant. Wrap the Codex OAuth branch of _refresh_entry in the existing shared _auth_store_lock (reentrant, cross-process flock) using the same extended-timeout pattern resolve_codex_runtime_credentials() already uses. A waiting process now blocks on the lock and, once inside, the in-lock re-sync picks up the rotated token the winner persisted and skips its own POST. Also send User-Agent: hermes-cli/<version> on the refresh request. Credit @cooper-oai (#34820) for identifying the concurrent-refresh reuse race; this ships the narrow lock-serialization fix without the separate Codex auth-store partition.	2026-07-01 02:45:07 -07:00
sprmn24	88d6e833f1	fix(agent): wrap list-type untrusted content in untrusted_tool_result _maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a high-risk tool (web_extract, browser_*) returns a multimodal content list ([{type:text},{type:image_url}]) — which _tool_result_content_for _active_model() produces by unwrapping the _multimodal envelope for vision-capable providers — the text part reached the model completely unguarded. An attacker page that ships one image bypassed the entire untrusted-data wrapper. Extend the wrapper to handle list content: each {type:text} part is run through the same string-wrapping path (min-char threshold, delimiter neutralization, one well-formed block), image/video parts pass through untouched so the list stays valid for vision adapters. Recursing into the existing string branch means the list path inherits the delimiter defang and the no-forgeable-fast-path hardening from #56172 for free. The outer list is rebuilt (not returned by identity), so callers compare by value.	2026-07-01 02:44:09 -07:00
mrparker0980	10a54ccc2c	fix(security): anchor @file context refs to canonical read deny-list `@file` / `@folder` context-reference expansion enforced its own narrow deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`) that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`, and `skills/.hub`. It never blocked the credential stores that the canonical read guard (`agent/file_safety.get_read_block_error`) protects: provider API keys (`~/.hermes/auth.json`), Anthropic OAuth tokens (`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`), webhook HMAC secrets, and project-local `.env` files. This matters because the messaging gateway feeds untrusted remote text straight into reference expansion: `gateway/run.py` calls `preprocess_context_references_async(..., allowed_root=_msg_cwd)` where `_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass the `allowed_root` check (it resolves under HOME), slip past the narrow list, and have the operator's live keys read into the agent's context — where the model would typically echo or act on them. Rather than duplicate and re-sync a second secret list, this routes the guard through the existing single source of truth. A reviewer might ask "why not just add `auth.json` to the local list?" — because the local list has already drifted once (a prior commit had to add `.config/gh`); anchoring to `get_read_block_error` means every future addition there protects this path too. The narrow checks are kept as a fallback since they also cover dirs that guard does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped so it can never crash reference expansion. N/A - [x] 🔒 Security fix - `agent/context_references.py`: `_ensure_reference_path_allowed` now also consults `agent.file_safety.get_read_block_error` after its existing checks and refuses the reference when that canonical guard flags the resolved path. The lookup is wrapped so guard-resolution failures fall back to the explicit checks instead of breaking expansion. - `tests/agent/test_context_references.py`: added `test_blocks_canonical_read_denylist_credential_stores`, asserting that `@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/`, and a project-local `.env` are all refused and their secret bodies never reach the expanded message. - `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release gate). 1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests pass, including the new credential-store case. 2. Regression proof: stash `agent/context_references.py`, run the suite with `-- -k canonical`, and confirm the new test fails (secrets leak into the message) without the fix; restore and confirm it passes. 3. `ruff check agent/context_references.py tests/agent/test_context_references.py` and `python scripts/check-windows-footguns.py agent/context_references.py tests/agent/test_context_references.py` both pass. - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.) - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only* changes related to this fix (plus the AUTHOR_MAP release gate) - [x] I've run the test suite for the touched area and all tests pass - [x] I've added tests for my changes (required for bug fixes) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A	2026-07-01 02:43:49 -07:00
Teknium	913e661a09	fix(cache): stop verification-loop synthetic nudges from persisting (#56194 ) verify_on_stop / pre_verify append a synthetic assistant "done" plus a synthetic user nudge to keep the agent going one more turn before it can claim completion. Both were flagged (_verification_stop_synthetic on the nudge only), but the flags were never registered in _EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding() filter that guards both persistence sinks (SQLite flush + JSON snapshot) let them through. The resumed transcript then inherited loop-only scaffolding, invalidating the prompt-prefix cache on later turns. - add _verification_stop_synthetic and _pre_verify_synthetic to _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use) - flag the blocked attempt assistant message too, not just the nudge, so the whole synthetic pair drops together and persistence does not keep a premature done with the nudge stripped (assistant to assistant adjacency) The API-payload leak claimed in the report is already handled: the chat_completions transport strips every underscore-prefixed message key before the wire, so the marker never reaches strict providers. Reported by patppham.	2026-07-01 02:26:06 -07:00
kshitijk4poor	a658f3b28b	fix(security): strip dynamic Hermes secrets from all subprocess spawn env Subprocesses spawned by the terminal tool, execute_code, Docker backend, and the codex app-server could inherit Hermes-internal secrets that the name-based `_HERMES_PROVIDER_ENV_BLOCKLIST` can't enumerate, because they're injected into `os.environ` at runtime under dynamic names: - `AUXILIARY_<TASK>_API_KEY` / `AUXILIARY_<TASK>_BASE_URL` — per-task side-LLM credentials bridged from `config.yaml[auxiliary]` by gateway/run.py and cli.py (vision, web_extract, approval, compression, plugin-registered tasks). Often separate, higher-spend keys plus base URLs pointing at private endpoints. - `GATEWAY_RELAY__SECRET` / `_KEY` / `_TOKEN` — relay-auth material provisioned by gateway/relay. Additionally, agent/transports/codex_app_server.py built its spawn env from a raw `os.environ.copy()`, bypassing the centralized `hermes_subprocess_env()` helper entirely — handing every codex subprocess the full Tier-1 secret set (GH_TOKEN, gateway bot tokens, Modal/Daytona infra tokens, dashboard session token) unfiltered. This is the #29157 sibling spawn-site gap; copilot_acp_client already routes through the helper. Fix — single chokepoint: - Add `_is_hermes_internal_secret(key)` in tools/environments/local.py as the single source of truth for the dynamic secret patterns. Matches AUXILIARY__API_KEY / _BASE_URL and GATEWAY_RELAY__SECRET/_KEY/_TOKEN; leaves non-secret AUXILIARY__PROVIDER/_MODEL and GATEWAY_RELAY routing hints visible. - Wire the predicate into every spawn path unconditionally (ignores skill env_passthrough opt-in AND inherit_credentials — a model-driving CLI never needs these): `_sanitize_subprocess_env` (both loops), `_make_run_env` (foreground), `hermes_subprocess_env` (Tier-1), and the Docker forward filter. - Add the static GATEWAY_RELAY_* names to `_HERMES_PROVIDER_ENV_BLOCKLIST` so the exact-match path catches them independently of the predicate. - Add the GATEWAY_RELAY_ID/_SECRET/_DELIVERY_KEY triplet to `_ALWAYS_STRIP_KEYS` (Tier-1) so it is stripped unconditionally on EVERY spawn surface — including the codex/copilot `inherit_credentials=True` path that skips the Tier-2 blocklist. `_SECRET`/`_DELIVERY_KEY` are already predicate-matched; `_ID` has no secret suffix, so enumerating it here is what closes its leak on the inherit path (self-review W1). - Defense in depth: env_passthrough.py `_is_hermes_provider_credential()` now consults the same predicate, so a skill can't register these names as passthrough and tunnel them into an execute_code / terminal child. - Route codex_app_server through `hermes_subprocess_env(inherit_credentials=True)` — strips Tier-1 + dynamic-internal secrets while provider creds (which codex needs to authenticate) still flow. Consolidates PRs #53715 (necoweb3 — the _is_hermes_internal_secret backbone + Docker filter), #53503 (srojk34 — env_passthrough guard), and #55709 (srojk34 — codex routing). Retires #52348 (claudlos): its copilot half is already on main, and its codex half used the full-strip `_sanitize_subprocess_env` which would break codex provider auth — the correct tier is `inherit_credentials=True`. Tests: TestHermesInternalDynamicSecrets (terminal + predicate + passthrough override), TestInternalDynamicSecrets (hermes_subprocess_env both tiers), TestSpawnEnvSecretStripping (codex spawn env), plus env_passthrough defense-in-depth cases. Co-authored-by: necoweb3 <sswdarius@gmail.com> Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com> Co-authored-by: claudlos <claudlos@agentmail.to>	2026-07-01 14:37:22 +05:30
qWaitCrypto	e1ff736f26	fix(anthropic): preserve ordered replay cache markers	2026-07-01 02:03:40 -07:00
qWaitCrypto	80d71e8d2e	fix(anthropic): preserve tool use cache markers	2026-07-01 02:03:40 -07:00
sasquatch9818	020d263ef6	fix(agent): defang untrusted-tool-result delimiter against tag injection `_maybe_wrap_untrusted` is the architectural defense against indirect prompt injection. It wraps attacker-controllable tool output (web_extract, web_search, browser_, mcp_) in `<untrusted_tool_result>...</untrusted_tool_result>` so the model treats it as data. The content was interpolated verbatim, so the boundary was forgeable. Two holes. A poisoned page that embeds `</untrusted_tool_result>` closes the block early — everything after it reads as trusted instructions. And the `startswith("<untrusted_tool_result")` re-entrancy guard returned content that merely started with the opening tag completely unwrapped, so an attacker just prefixed the tag to drop all data framing. Fix neutralizes any embedded delimiter token (case-insensitive) before interpolation and drops the forgeable fast-path, so content is always sealed in exactly one well-formed block. Re-wrapping an already-wrapped forward is harmless — it stays framed as data. ## What does this PR do? Closes an indirect prompt-injection bypass in the untrusted-tool-result wrapper. Attacker content can no longer break out of, or forge, the trust boundary. ## Related Issue N/A ## Type of Change - [x] 🔒 Security fix ## Changes Made - `agent/tool_dispatch_helpers.py`: add `_neutralize_delimiters` (case-insensitive defang of the `untrusted_tool_result` token); `_maybe_wrap_untrusted` now always neutralizes then wraps, and the forgeable `startswith` re-entrancy guard is removed. - `tests/agent/test_tool_dispatch_helpers.py`: replace the double-wrap test (it encoded the bypass) with regression tests for embedded closing tag, leading opening tag, and a cased closing tag. ## How to Test 1. `scripts/run_tests.sh tests/agent/test_tool_dispatch_helpers.py` — 29 pass. 2. Embedded `</untrusted_tool_result>` mid-content: real closing delimiter appears once, at the end; payload trapped inside. 3. Content starting with the opening tag: data framing is applied, not skipped. ## Checklist ### Code - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only changes related to this fix - [x] I've run the affected tests and they pass - [x] I've added tests for my changes - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (docstrings) — or N/A - [x] cli-config.yaml.example — N/A - [x] CONTRIBUTING.md / AGENTS.md — N/A - [x] Cross-platform impact — N/A (pure-Python, stdlib `re`) - [x] Tool descriptions/schemas — N/A	2026-07-01 01:54:45 -07:00
kshitijk4poor	6e97f5c3f8	test(compressor): tidy blank-line spacing + assert placeholder never overwrites text Review follow-up on the batch salvage: normalize the inter-class spacing to two blank lines (PEP8) between the three new test classes, and add an explicit assertion in test_sanitizer_strips_orphaned_preserves_text_content that the '(tool call removed)' placeholder does NOT overwrite existing assistant text. No production change.	2026-07-01 14:24:41 +05:30
liuhao1024	8f4d195d5f	fix(compressor): pin summary role to user when only system prompt is protected (#52160 ) After the first compaction protect_first_n decays, so on a later compaction the only protected head message can be the system prompt. Adapters like Anthropic and Bedrock send the system prompt as a separate parameter, so the summary becomes the first message in messages[] — and Anthropic rejects any request whose first message is not role=user (HTTP 400). Pin the summary to role=user when the head is system-only, and stop the collision-flip logic from reverting it back to assistant. Salvaged from #52167. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
srojk34	82ac7e16b8	fix(compression): preserve network/auth abort flags across cooldown re-entry (#29559 ) compress() eagerly reset _last_summary_auth_failure and _last_summary_network_failure at the top of every call. On a second compress() during the failure cooldown, _generate_summary() returns None from the cooldown early-return WITHOUT re-asserting those flags, so the abort guard saw False and fell through to the destructive static-fallback that drops the middle window — the data-loss #29559/#25585 describe. Stop resetting them eagerly; a successful summary already clears both, so letting them persist across calls is safe and keeps the cooldown abort protection intact. Salvaged from #52056. Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>	2026-07-01 14:24:41 +05:30
liuhao1024	32b23bfb08	fix(compressor): strip orphan tool_calls instead of inserting stubs (#51218 ) _sanitize_tool_pairs inserted stub role="tool" results for orphaned tool_calls. The pre-API repair_message_sequence() tracks known call IDs by tc.get("id") while this sanitizer keys on call_id\|\|id; when they disagree (Codex Responses API: id != call_id) the stubs are silently dropped by the repair pass, re-exposing the original orphans. Strip the orphaned tool_calls at the source instead (preserving any text content, adding a placeholder for an otherwise-empty assistant turn) to avoid the mismatch class entirely. Salvaged from #51225. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30

1 2 3 4 5 ...

939 commits