Commit graph

1057 commits

Author SHA1 Message Date
Jaaneek
5ef0b8acb0 feat(auth): make xAI Grok OAuth device-code-only, drop loopback login
Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.

- Poll the token endpoint with server-provided interval, honoring
  slow_down and expires_in; store tokens with auth_mode
  oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
  rotated tokens sync back to auth.json, the global root store, and the
  credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
  drop the duplicate dashboard pool entry so exactly one seeded
  device_code entry exists.
- Use the shared device_code source name for consistency with the
  nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
  pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
  --manual-paste flag from documented commands.
2026-07-02 13:17:41 -07:00
Jneeee
b98baa3039 feat(config): extra HTTP headers for LLM API calls (#3526 salvage)
Named providers / custom_providers entries in config.yaml now accept an
extra_headers dict scoped to that endpoint — for reverse proxies, API
gateways, and custom auth schemes (e.g. Cloudflare Access service tokens).

- hermes_cli/config.py: normalize extra_headers on provider entries
  (_normalize_custom_provider_entry + providers-dict translation), add
  get_custom_provider_extra_headers /
  apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on
  base_url (case/trailing-slash insensitive, no substring bypass —
  mirrors the TLS helpers)
- hermes_cli/runtime_provider.py: surface extra_headers in the resolved
  runtime for named custom providers (providers dict, legacy
  custom_providers list, and the credential-pool path)
- run_agent.py / agent/agent_init.py: merge per-provider extra_headers
  onto the OpenAI client default_headers at construction and on every
  _apply_client_headers_for_base_url re-application (credential swaps,
  rebuilds), most-specific level wins; OpenAI-wire only (native
  Anthropic/Bedrock scoped out)
- agent/auxiliary_client.py: accept model.extra_headers as an alias of
  model.default_headers for the global variant
- cli-config.yaml.example: documented commented example
- Header values are treated as secrets and never logged

Salvaged from PR #3526 by @jneeee, reimplemented against current main.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-07-02 05:33:25 -07:00
kshitijk4poor
676236bb1d fix(agent): honor custom CA certs on aux client + harden TLS resolution
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:

- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
  accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
  the main-client TLS resolution (via load_config_readonly, the read-only
  fast path) so compression/vision/web_extract/title-gen/session_search
  honor the same per-provider CA. Without this, chat worked against a
  private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
  instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
  ssl_verify edits are honored on mid-session model switch — matching the
  context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
  (httpx ignores it there); verify lives on the transport. Fix docstrings.
  Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
  agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
  the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
  dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
  entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
  logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
  match, and prefix-bypass rejection.
2026-07-02 04:51:56 +05:30
HexLab98
3a2ba959ce fix(agent): honor custom CA certs for custom_providers HTTPS endpoints
Wire ssl_ca_cert and ssl_verify through custom_providers config and env
vars into the keepalive httpx client, fixing APIConnectionError against
mkcert/self-signed Ollama proxies behind HTTPS.
2026-07-02 04:51:56 +05:30
Teknium
ba0bc01d1f
feat(delegate): remove model-facing toolsets arg — subagents always inherit parent's (#56386)
The model could pass `toolsets` (top-level and per-task) to delegate_task,
letting it choose which toolsets a subagent got. Toolset selection is a
capability-scoping decision the model should not control; subagents inherit
the parent's enabled toolsets, period.

- Remove `toolsets` from the delegate_task() signature, the registry handler,
  the top-level + per-task JSON schema, and the live dispatch path
  (run_agent._dispatch_delegate_task — this forwarded it on every model call).
- Single-task and per-task child builds now pass toolsets=None so
  _build_child_agent resolves to pure parent inheritance.
- Drop the now-dead _SUBAGENT_TOOLSETS / _TOOLSET_LIST_STR schema-hint block.
- _build_child_agent keeps its internal toolsets param + intersection helpers
  (internal API; fed the inherited value only).
- Tests: schema assertions flipped to assertNotIn; added a regression test
  proving the dispatch path never forwards a smuggled model `toolsets`.
- Docs: update delegate_task signature refs in the autonomous-ai-agents skill.
2026-07-01 05:35:26 -07:00
Steve Lawton
c73e74386b feat(vertex): add Google Vertex AI provider for Gemini (OAuth2)
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).

- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
  (5-min margin), ADC->service-account fallback, global vs regional
  endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
  reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
  credentials-file path is never mistaken for a static API key; mints a
  fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
  re-mints the token and rebuilds the client on a mid-session 401, so a
  long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
  env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
  google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
  host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
  runtime resolution.

Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
2026-07-01 05:25:33 -07:00
kyssta-exe
7eb9716ad7 fix(agent): apply persist override to the DB row only, never the live list (#48677)
The persist user-message override was applied in place to the live messages
list. On the early crash-resilience persist (which runs BEFORE api_messages is
built), that stripped observed group-chat context off the live user message and
silently dropped it when observe_unmentioned_group_messages was enabled.

Fix at the single chokepoint: _flush_messages_to_session_db resolves the
override (idx/content/timestamp) locally and applies it ONLY to the row written
to the DB — the live dict is never mutated, so EVERY persist caller (early
persist, mid tool-loop flush, /resume, /branch) is protected uniformly. This
supersedes the earlier shallow-copy approach, which broke the intrinsic
_DB_PERSISTED_MARKER idempotency (copies never propagated the marker back to
the live dicts → duplicate rows) and closes the sibling class tracked in #56303.

Trailing empty-response scaffolding is still dropped from the live list in
_persist_session (unchanged behavior).

Salvaged from #48817; chokepoint reworked to coexist with the marker-based
dedup (#50372).

Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-07-01 17:28:04 +05:30
arminanton
e2fa509bf3 fix(review): isolate the background-review fork from the canonical session
The forked skill/memory review agent shares the parent's session_id for
prompt-cache warmth. Without isolation it wrote its harness turn ('Review the
conversation above and update the skill library…') plus its curator-mode reply
straight into the user's REAL session in state.db; the next live turn re-read
that injected user message as a standing instruction and the agent 'became' the
curator, refusing the actual task.

Root fix: a _persist_disabled flag on the fork that hard-stops every DB write
and lazy-open path (_flush_messages_to_session_db, _ensure_db_session,
_get_session_db_for_recall) — the review writes only to the skill/memory stores
via its tools. Defense-in-depth: _strip_background_review_harness drops any
stray harness message (and the assistant reply that followed) at load time in
get_messages_as_conversation, so an already-polluted session resumes clean.

Salvaged from #50296.

Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>
2026-07-01 16:21:39 +05:30
kshitijk4poor
d3010b74db test(agent): strengthen id-reuse regression + refresh flush docstring (review)
Phase 2c review follow-up on the id()-reuse persistence fix:

- test_recycled_id_in_dedup_set_still_persists_new_message seeded an EMPTY
  dedup set, so it never injected a collision and passed under id-based dedup
  too (couldn't distinguish the designs). Replace with
  test_stale_seed_id_from_prior_flush_cannot_suppress_new_message, which asserts
  the durable invariant: the seed is empty after every flush (mutation-checked:
  removing the post-flush reset now fails BOTH id-reuse tests).
- Refresh the _flush_messages_to_session_db docstring: it still described the
  old per-session identity tracking; document the intrinsic-marker mechanism,
  that _flushed_db_message_ids is now a one-shot seed, and the shared-dict
  mutation safety note.
2026-07-01 16:17:46 +05:30
rrevenanttt
e4c6d1b22b fix(agent): persist messages by intrinsic marker to stop id() reuse data loss
_flush_messages_to_session_db deduped persisted messages with a retained
{id(msg)} set (_flushed_db_message_ids) kept across turns. Once a flushed dict
is dropped from the live list (scaffolding rewind / in-place compaction) and
GC'd, CPython recycles its address onto a new assistant/tool dict whose id()
collides with the stale entry — so the real turn is silently never written to
state.db.

Replace the retained id-set with an intrinsic _DB_PERSISTED_MARKER stamped on
each dict. The id-set is demoted to a one-shot seed (valid only while the
caller's objects are alive) that is translated to markers and cleared after
every flush, so no id() outlives a flush to alias a future message. The marker
is _-prefixed so the wire sanitizers strip it before any request leaves.

Preserves the existing _is_ephemeral_scaffolding skip. Salvaged from #50372.

Co-authored-by: rrevenanttt <290873280+rrevenanttt@users.noreply.github.com>
2026-07-01 16:17:46 +05:30
Tranquil-Flow
122e5bc037 fix(agent): retry 413 after stripping vision payloads (#47339)
When text compression can't reduce a 413 request further, evict base64
image parts from tool messages and retry once instead of dead-ending
with 'Payload too large and cannot compress further.'

A 413 is a request-body byte-size limit, not a token limit. browser_vision
screenshots (2-5MB base64 each) keep the HTTP body oversized even after
aggressive summarization. The strip pass passes remember_model=False so a
413 does not poison _no_list_tool_content_models — that set is for providers
that reject list-type tool content, a distinct failure mode.

Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current
token-aware 413 recovery else branch.
2026-07-01 03:18:41 -07:00
Teknium
913e661a09
fix(cache): stop verification-loop synthetic nudges from persisting (#56194)
verify_on_stop / pre_verify append a synthetic assistant "done" plus a
synthetic user nudge to keep the agent going one more turn before it can
claim completion. Both were flagged (_verification_stop_synthetic on the
nudge only), but the flags were never registered in
_EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding()
filter that guards both persistence sinks (SQLite flush + JSON snapshot)
let them through. The resumed transcript then inherited loop-only
scaffolding, invalidating the prompt-prefix cache on later turns.

- add _verification_stop_synthetic and _pre_verify_synthetic to
  _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use)
- flag the blocked attempt assistant message too, not just the nudge, so
  the whole synthetic pair drops together and persistence does not keep a
  premature done with the nudge stripped (assistant to assistant adjacency)

The API-payload leak claimed in the report is already handled: the
chat_completions transport strips every underscore-prefixed message key
before the wire, so the marker never reaches strict providers.

Reported by patppham.
2026-07-01 02:26:06 -07:00
petrichor-op
f2a528fb59 fix(agent): never persist empty-response recovery scaffolding
Ephemeral empty-response/prefill recovery scaffolding (the synthetic
assistant "(empty)" turn, the user nudge, the terminal "(empty)"
sentinel, and the thinking-only prefill placeholder) exists only to
drive the next API retry; the in-memory loop pops it before appending
the real response. The append-only flush did not mirror that, so a
mid-turn persist could commit scaffolding to the SQLite session store
(and JSON log), and a resumed session would replay synthetic
"(empty)"/nudge turns as genuine context — re-poisoning the empty-retry
boundary forever.

Filter ephemeral scaffolding at both durable-write sites
(_flush_messages_to_session_db + _save_session_log), by flag not
position, so buried scaffolding (an answered nudge leaves the synthetic
pair mid-list) is skipped too. Covers all three flags including
_thinking_prefill.

Adapted onto current main's identity-tracking flush.

Cherry-picked from #41281 by petrichor-op.
2026-07-01 01:08:27 -07:00
峯岸 亮
bc6cd46925 fix(agent): restrict todo hydration to paired assistant todo calls
The gateway/API server rebuilds the in-memory TodoStore by replaying
caller-supplied conversation_history. _hydrate_todo_store previously
accepted any role:tool message containing a "todos" array, so a forged
bare tool result could seed arbitrary todo state and re-inflate context
every turn (GHSA-5g4g-6jrg-mw3g).

Restrict hydration to tool results paired with an earlier assistant
todo tool call (matching tool_call_id, function name == todo, no
user/system boundary between). Reuse the existing _get_tool_call_id/
name_static helpers so dict- and object-shaped tool calls both work.
Add a generous MAX_TODO_RESULT_CHARS payload guard to drop absurd
forged results before parsing; item/content caps already exist on main.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-07-01 01:02:17 -07:00
HiddenPuppy
0e4c879a3b fix: keep plain custom GPT-5 relays on chat completions
Generic provider:custom relays were force-routed to the OpenAI Responses
API whenever the model matched gpt-5*, and a stale persisted
model.api_mode=codex_responses survived /reset and upgrades. Some
OpenAI-compatible relays do not implement Responses semantics, which
surfaced as malformed function_call.name replay errors in gateway sessions.

- runtime_provider: route custom-provider api_mode through
  _resolve_plain_custom_api_mode(), which drops a stale codex_responses
  unless the URL is direct OpenAI/xAI
- run_agent: _provider_model_requires_responses_api returns False for
  custom; direct api.openai.com / api.x.ai URLs still upgrade via
  _is_direct_openai_url() / URL detection
- regression coverage for plain relays vs direct OpenAI/xAI URLs

Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
2026-06-30 15:57:52 -07:00
Erosika
00eefc7f2b style(profile): frame comments around what the code does 2026-06-30 15:30:06 -07:00
Erosika
a6175d1f93 style(profile): trim verbose comments to one or two lines 2026-06-30 15:30:06 -07:00
Erosika
09af0a8c1d fix(profile): propagate profile context across thread/executor boundaries
A bare threading.Thread / ThreadPoolExecutor worker starts with an empty
contextvars.Context, so the context-local profile override
(_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process
multi-profile runtimes (desktop tui_gateway) the worker then resolves
get_hermes_home() to the launch/default profile, leaking one profile's
reads/writes into another. The fix primitive (tools.thread_context.
propagate_context_to_thread, which copies the parent context) already exists;
the leaking spawns simply did not use it.

- model_tools.py _run_async: wrap the worker-thread loop runner. This is the
  generic sync->async bridge for every async tool, so wrapping it here fixes
  the leak for all async tools at once (verified: an async tool reading
  get_hermes_home() under an override now resolves the active profile).
- run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land
  in the spawning turn's profile (#54937 path).
- tools/async_delegation.py: wrap both single + batch executor.submit calls so
  detached children resolve the dispatching profile's paths.

Scope: the vision CPU executor is intentionally left unwrapped — it runs pure
in-memory encode/resize and never resolves profile-scoped paths.
2026-06-30 15:30:06 -07:00
NiuNiu Xia
fb07215844 fix(copilot): recognize enterprise subdomains in host checks
The earlier enterprise base URL change (proxy-ep parsing) gave us URLs
like `api.enterprise.githubcopilot.com`, but ~15 host-matching call
sites still hard-coded `api.githubcopilot.com`. Enterprise users would
therefore drop the `Copilot-Integration-Id: vscode-chat` header at
client-build time, and upstream rejected requests with:

    The requested model is not available for integrator "zed"
    (or "copilot-language-server") — verify the correct
    Copilot-Integration-Id header is being sent.

The header was correct in copilot_default_headers(); it just never
made it into default_headers for non-default hostnames because every
detector compared against the exact string "api.githubcopilot.com".

This commit broadens all those checks to "githubcopilot.com" via
base_url_host_matches (which already does proper subdomain matching),
so api.enterprise.githubcopilot.com, api.business.githubcopilot.com,
etc. all share the same headers, vision routing, max_completion_tokens
selection, and reasoning-effort detection as the default endpoint.

Also adds ".githubcopilot.com" to _URL_TO_PROVIDER so context-window
resolution via models.dev works for enterprise base URLs, and tightens
_is_github_copilot_url to use suffix matching instead of strict equality.

Tests:
- New: enterprise Copilot endpoint preserves Copilot-Integration-Id
- New: enterprise endpoint returns max_completion_tokens (not max_tokens)
- Existing 333 base_url / copilot / aux-client / credential-pool tests pass

Parts 5 of #7731.
2026-06-30 03:27:41 -07:00
nightq
fa3ab2ffd0 fix: normalize tool_call_id whitespace in sanitizer
_sanitize_api_messages() compared raw tool_call_id strings without
stripping whitespace. When assistant-side IDs and tool-result IDs
diverged due to surrounding whitespace, valid tool results were treated
as orphaned and replaced with [Result unavailable] stub placeholders.

Strip whitespace in _get_tool_call_id_static() (both call_id/id paths,
dict and object) and at the two result_call_id comparison sites in
sanitize_api_messages(). Adds regression tests for preserved-whitespace
results and orphaned-whitespace removal.

Closes #9999
2026-06-30 01:43:40 -07:00
Rod Boev
6fd701acbe fix(agent): keep cooldown state on the active session (#54465) 2026-06-30 13:36:29 +05:30
teknium1
ea1372d2af fix(security): wire session-id sanitizer into artifact paths + API boundary
Defense-in-depth on top of _safe_session_filename_component (#5958):

Sink (makes the bad write impossible regardless of entry point):
- run_agent._save_session_log: sanitize session_id before building the
  session_{sid}.json snapshot path.
- agent_runtime_helpers.dump_api_request_debug: sanitize before building
  the request_dump_{sid}_{ts}.json path.

Boundary (clean 400 instead of a silently-hashed filename):
- api_server rejects path-traversal-shaped X-Hermes-Session-Id on the
  session-continuation path and the explicit /api/sessions create path,
  reusing gateway.session._is_path_unsafe (mirrors the native gateway's
  entry-boundary guard). Also enforces the session-header length cap on
  the continuation path.

Tests: traversal session_id stays contained at the write site; sanitizer
always yields a traversal-free segment; the API header rejects
../, absolute, and Windows-traversal IDs with 400.
2026-06-29 04:25:45 -07:00
Xowiek
1debd5e8f9 fix(security): add session-id filename sanitizer to prevent path traversal
Session IDs can originate from untrusted input (e.g. the
X-Hermes-Session-Id API header) and are interpolated raw into on-disk
artifact filenames under ~/.hermes/sessions/. A traversal-shaped ID
(../../../../etc/pwned) would let a caller write the session snapshot
or request dump outside the sessions directory.

_safe_session_filename_component() collapses every non [A-Za-z0-9_-]
character to _, caps the length, and appends a short content hash when
sanitization changed the string, always yielding a single traversal-free
path segment.

Closes #5958.
2026-06-29 04:25:45 -07:00
aaronlab
ec148f5d31 fix(agent): guard Anthropic interrupt, cap vision data-URL size
Two independent agent-loop hardening fixes:

- anthropic: when the streaming loop breaks on _interrupt_requested,
  return None instead of calling stream.get_final_message() on the
  partially-drained stream — the SDK may hang draining remaining events
  or return a Message with incomplete tool_use blocks. The outer poll
  loop raises InterruptedError, so the return value is discarded anyway.

- vision: add a 20 MB cap on base64 data-URL payloads before
  base64.b64decode() in _materialize_data_url_for_vision. A 100MB+
  payload creates ~275MB of memory pressure; gateway users sharing the
  process can trivially OOM it. Oversized payloads return ("", None).

The third change from the original PR (streaming tool-name +=  to
assignment dedup) was already landed independently on main.

Co-authored-by: aaronlab <1115117931@qq.com>
2026-06-28 18:53:20 -07:00
xxxigm
093f567f0d fix(agent,cli): surface empty-body API errors and fail oneshot exit code
When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}),
`_summarize_api_error` fell through to a bare `str(error)`, so users saw only
"HTTP 400" with no provider detail (reported on Windows in #36109). The SDK
leaves `body` empty in this case, but the httpx `response` still carries the
payload in `.text`.

- run_agent.py `_summarize_api_error`: when `body` is empty, fall back to
  `response.text` — parse a JSON `error.message`/`message` when present, else
  surface the raw (truncated) body. Platform-agnostic diagnostics.
- hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns
  exit code 2 when the run is failed/partial with no usable final response, so
  scripts can detect LLM failures (still 0 when a response — incl. an error
  summary as output — is produced).

Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw
text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests.

NOTE: #36109's original root cause (Windows "all providers return empty 400")
is not reproducible on current main (heavy provider-transport churn since
v0.15.1). This change does not claim to fix that root cause — it makes any
empty-body API error LEGIBLE so a future occurrence shows the real provider
message instead of a bare HTTP 400. Relates to #36109 (does not close it).
2026-06-28 02:05:20 -07:00
kurlyk
def97bcd96 fix: eliminate race condition in OpenAI client replacement
Make check-and-replace atomic in _ensure_primary_openai_client by
keeping both operations under the same lock acquisition. Previously,
the lock was released between detecting a closed client and replacing
it, allowing two threads to simultaneously replace the client.

Fixes #32846

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 01:08:04 -07:00
Teknium
d43e0cf304
fix(agent): config-driven intent-ack continuation for all api_modes (#27881) (#53943)
* fix(agent): config-driven intent-ack continuation for all api_modes (#27881)

The agent could end a turn after only stating intent ('I will run a health
check...') without executing the announced tool call, forcing the user to
re-prompt. A continuation guard that catches this and nudges the model to
proceed already existed but was hard-gated to the codex_responses api_mode,
so Gemini/Claude/OpenRouter turns never benefited.

- New agent.intent_ack_continuation config (default 'auto' = codex-only,
  byte-stable for existing conversations). 'true'/model-list opts every
  api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape.
- looks_like_codex_intermediate_ack gains require_workspace (default True).
  The opted-in path drops the codebase/filesystem requirement so general
  autonomous workflows (server ops, deploys, API calls) are caught, not just
  coding tasks. Future-ack + action-verb + short-content + no-prior-tool
  guards still apply; the 2-nudge-per-turn cap is unchanged.
- Resolution centralized in intent_ack_continuation_mode (off/codex_only/all).

* docs(infographic): intent-ack continuation (#27881)
2026-06-27 20:46:00 -07:00
teknium1
f062cf076b fix(agent): also treat provider=ollama as an Ollama GLM backend
Follow-up to the #13971 fix: a genuine native Ollama provider reached
through a reverse proxy carries no ollama/:11434 URL signature, so the
restricted detection would miss it. Add provider=="ollama" as an
explicit True case (idea from #14789, @Tranquil-Flow) and cover both it
and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.
2026-06-27 04:03:07 -07:00
YuShu
266521b55f refactor(agent): trim docstring per review feedback
Remove commentary about the previous is_local_endpoint() approach
from _is_ollama_glm_backend() — git history suffices.
2026-06-27 04:03:07 -07:00
YuShu
00a8252b7d fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.

When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.

Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.

Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.

This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
  mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
  spurious continuation messages into the conversation — strictly worse

Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 04:03:07 -07:00
DavidMetcalfe
27c486e3b1 feat(agent): apply per-reasoning-model stale-timeout floor in stream + non-stream detectors
Wire get_reasoning_stale_timeout_floor() into both stale detectors so known
reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek
R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of
the upstream gateway idle-killing the socket (BrokenPipeError) before first
token. Applied as max(default, floor) — never overrides explicit user config,
never lowers an existing threshold.

The reasoning_timeouts.py allowlist module already landed on main via #52795,
so this salvage carries only the wiring + tests (the duplicate module and the
stale-base MoA reverts from the original PR branch are dropped).

Salvaged from #52238. Fixes #52217.
2026-06-25 22:12:06 -07:00
Teknium
c6575df927
feat(moa): expose MoA presets as selectable virtual models (#46081)
* feat(moa): expose MoA presets as selectable virtual models

Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.

Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.

- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
  / one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.

* fix(moa): thread moa_config through _run_agent to _run_agent_inner

The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).

* fix(moa): classify moa as a virtual provider in the catalog

The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
  - test_provider_catalog: api_key providers must expose a credential env var
  - test_provider_parity: every hermes-model provider must be desktop-configurable

moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
2026-06-25 13:52:06 -07:00
Brooklyn Nicholson
2f1a47b90e feat(agent): require verification before finishing edits
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
2026-06-24 23:02:48 -05:00
Teknium
7130d60861
feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492)
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers

Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632

Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.

* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed

The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
2026-06-21 19:53:27 -07:00
yeyitech
b17180d950 fix(session): finalize owned SQLite session rows on AIAgent.close()
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.

end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.

Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.

Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.

Closes #12029.
2026-06-21 11:35:09 -07:00
konsisumer
3e354b61db fix(agent): preserve copilot routed headers 2026-06-21 11:29:49 -07:00
Teknium
ea8a8b4af8
feat(delegation): background fan-out — parallel subagents, one consolidated return (#49734)
* feat(delegation): single-task delegate_task always runs in the background

The model no longer decides whether a subagent runs in the background — a
single-task delegate_task from the top-level agent is now always dispatched
async, so the parent turn returns immediately and the subagent's result
re-enters the conversation when it finishes.

- run_agent._dispatch_delegate_task (the live model path) forces
  background=True for top-level single-task calls; the schema-level
  `background` param is ignored.
- A batch (tasks with >1 item) stays synchronous (fan-out can't go async).
- A delegation from an orchestrator subagent (depth > 0) stays synchronous —
  it needs its workers' results within its own turn.
- The function-level default is unchanged, so direct Python callers/tests keep
  the historical synchronous behavior.
- On async-pool capacity rejection, single-task now falls through to a
  synchronous run instead of erroring (the child stays attached for interrupt
  propagation; detach happens only on a successful dispatch).
- Schema `background` param marked deprecated/ignored; tool description
  updated to state the always-background single-task rule.

* feat(delegation): all delegate_task fan-out runs in the background

Extend the always-background behavior to the full fan-out. A batch is now
dispatched as N independent async subagents (one handle each), instead of
running synchronously. Single task and batch both return immediately; each
subagent's result re-enters the conversation as its own message when it
finishes.

- delegate_task: when background is set, loop over ALL built children and
  dispatch each via dispatch_async_delegation; return a combined handle block
  (count + per-task delegation_ids). Children the async pool rejects (at
  capacity) run synchronously inline and are reported alongside the dispatched
  handles, so nothing is silently dropped.
- run_agent._dispatch_delegate_task + registry handler: force background for
  any top-level model delegation (single OR batch); orchestrator subagents
  (depth > 0) still run synchronously since they need workers' results within
  their own turn.
- Removed the v1 'batch async not supported' rejection.
- Tool description updated: BOTH MODES RUN IN THE BACKGROUND.
- Tests updated to assert batch fan-out dispatches each task async (verified
  E2E: 3-task batch -> 3 independent completion-queue events).

* fix(delegation): background fan-out joins and returns one consolidated block

Correct the fan-out semantics: a backgrounded batch is dispatched as ONE
async unit (one handle, one async-pool slot), not N independent dispatches.
The unit runs all children in parallel, waits on every one, and emits a
SINGLE completion event carrying the consolidated per-task results. The chat
is never blocked; when all subagents finish, their full summaries re-enter
the conversation together as one message.

- async_delegation.dispatch_async_delegation_batch + _finalize_batch: a batch
  occupies one slot; its runner returns the combined {results:[...]} dict and
  one event with the full results list is pushed to the completion queue.
- delegate_tool: extract the sync execution+aggregation into
  _execute_and_aggregate(); background dispatches it via the batch unit and
  returns one handle; on pool-capacity rejection it runs the batch inline.
- process_registry._format_async_delegation: render a consolidated multi-task
  block (TASK i/N + per-task summary) when the event carries is_batch/results.
- Tests updated; E2E verified: 3-task batch -> immediate return -> one combined
  completion block with all three summaries.
2026-06-20 11:27:12 -07:00
kshitijk4poor
a7dd98c860 fix(env): guard remaining malformed int/float env var casts with utils helpers
Widen the env_float() guard from #48735 across the whole bug class: a
non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd
port) raised an unhandled ValueError and crashed adapter/agent init.

Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to
the canonical utils.env_int / utils.env_float helpers (the established house
pattern), instead of duplicating per-module helpers or inline try/except:

- gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT
- gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL
- gateway/platforms/feishu.py: dedup cache + text/media batch settings
- gateway/platforms/wecom.py, discord/adapter.py: text batch delays
- gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT
- gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT
- hermes_cli/auth.py: CODEX/XAI refresh timeouts
- agent/chat_completion_helpers.py: API/stream read/stale timeouts
- run_agent.py, agent/auxiliary_client.py: API + nous timeouts

Sites already guarded by try/except or local helpers are left untouched.
The HERMES_MAX_ITERATIONS sites are already guarded on main via
_current_max_iterations(), so they are not included.
2026-06-20 14:54:36 +05:30
Gille
013f9c8750 fix(memory): log CLI shutdown hook failures
Makes the CLI memory-provider shutdown path observable: log when CLI
cleanup calls memory shutdown (with session id + message count), warn
instead of swallowing CLI memory-shutdown exceptions, warn on
on_session_end failures during agent shutdown, and raise the
MemoryManager provider-hook failure log from debug to warning with a
traceback.

Salvaged from PR #49287 (authored by Gille / @helix4u).
2026-06-19 16:59:43 -07:00
KeyArgo
1e40b21b2e
docs: clean up three stale comments from the #32848 audit (#45638)
* docs: clean up three stale comments from the #32848 audit

- tools/memory_tool.py:20 — 'read' action was intentionally removed
  but the docstring still listed it. Now matches the schema.
- tools/fuzzy_match.py:9 — unicode_normalized was added but the
  chain-count docstring still said '8-strategy'. Now says '9'.
- run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in.
  Replaced with a backfill note.

Fixes #32848 (parts 3, 4, and 12)

* docs(memory): also remove stray memory(action=read) references in lines 144 and 201

The original #32848 audit fix (in 6fd661d6) only addressed line 20
(the action list in the module docstring), but the action was
referenced in two other places:

- tools/memory_tool.py:144 — in a class docstring, claimed
  'memory(action=read)' was a way to SEE poisoned entries
- tools/memory_tool.py:201 — in a user-facing warning message,
  told the user to 'use memory(action=read) to inspect'

Since the schema on line 683 only allows add/replace/remove, both
references were misleading: the first claimed a way to inspect
poisoned entries that doesn't exist, the second would error out
when the user followed the warning.

This commit removes both references:
- Line 144: '...keep the original text so the user can still SEE
  poisoned entries by inspecting the source files directly, and
  remove them — silently dropping them would hide the attack
  from the user.'
- Line 201: '...use memory(action=remove) to delete the
  original. (drop the read-action reference)'

Followup to the previous commit on this branch.

---------

Co-authored-by: KeyArgo <keyargo@argobox.com>
2026-06-19 16:09:30 -07:00
Gille
a7983d5ad7
fix(dashboard): hide sidecar sessions from history (#49269)
* fix(dashboard): hide sidecar sessions from history

* test(dashboard): allow sidecar source in session payload
2026-06-19 18:06:38 -04:00
tt-a1i
46f9d53468 fix(agent): aggregate anthropic aux calls via stream 2026-06-19 17:32:13 +05:30
Gille
e4452ffb8a fix(agent): summarize structured provider error messages 2026-06-18 21:37:52 -07:00
Reiji Kisaragi
3d21666b2f fix: preserve multimodal user content during persistence
Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242
2026-06-17 09:49:39 -07:00
Wolfram Ravenwolf
bd7fc8fdcd feat(gateway): inject stable human-readable message timestamps
Consolidates these related Amy fork patches:
- 429830f39 feat(gateway): inject message timestamps into user messages for LLM context
- 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay
- 2874f7725 feat: human-friendly timestamp format with weekday and timezone name
- 3735f4c8b fix: render gateway message timestamps once
2026-06-16 15:49:59 -07:00
teknium
28f92478e3 test(hooks): cover session:compress event; drop dead import
Follow-up to salvaged PR #41624:
- Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused)
- Add tests: session:compress emits with correct context, no-callback is
  safe, and a callback exception does not break compression
2026-06-16 11:45:36 -07:00
Wolfram Ravenwolf
e76e7b5073 feat(hooks): session:compress event_callback for MemPalace sync 2026-06-16 11:45:36 -07:00
Wolfram Ravenwolf
4cf9d80fba feat(display): verbose skill change notifications with content previews
When display.memory_notifications is set to 'verbose', skill_manage
notifications now show meaningful change details instead of just the
generic tool message.

Before (verbose mode):
  💾 📝 Patched SKILL.md in skill 'gogcli' (1 replacement).

After (verbose mode):
  💾 📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..."

Changes:
- skill_manager_tool.py: _patch_skill() now includes old/new string
  previews (truncated to 200 chars) in the result via '_change' key.
  _create_skill() and _edit_skill() include skill description from
  frontmatter for verbose create/edit notifications.
- run_agent.py: Background review notification builder now reads the
  '_change' dict from skill tool results and formats descriptive
  notifications per action type (patch → old→new diff, create/edit →
  description preview). Falls back to generic message when _change
  data is unavailable (backwards compatible).

This is especially useful when subagents patch skills, since neither
the user nor the parent agent can see what the subagent changed.
2026-06-16 05:45:40 -07:00
Teknium
0a8f3e21b8
fix(delegation): forward background flag so delegate_task(background=true) runs async (#46968)
* fix(skills): guard recursive skill delete against tree-escape

Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.

Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.

Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.

* fix(delegation): forward background flag in delegate_task dispatch

delegate_task is an _AGENT_LOOP_TOOLS member, so every surface (CLI,
gateway, desktop/TUI) routes it through AIAgent._dispatch_delegate_task.
That forwarder passed every schema field except background, so
delegate_task(background=true) was silently downgraded to a synchronous
run and returned the sync results payload instead of a delegation_id.

The model sees background in the schema (the call validates), but the
value never reached the function. Add the one missing kwarg so async
background delegation actually engages.
2026-06-15 18:52:02 -07:00
Teknium
49e743985a fix: route minimax m3 reasoning controls through profile
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.

Co-authored-by: goku94123 <gooku94123@gmail.com>
2026-06-15 07:08:43 -07:00