Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.
- Poll the token endpoint with server-provided interval, honoring
slow_down and expires_in; store tokens with auth_mode
oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
rotated tokens sync back to auth.json, the global root store, and the
credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
drop the duplicate dashboard pool entry so exactly one seeded
device_code entry exists.
- Use the shared device_code source name for consistency with the
nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
--manual-paste flag from documented commands.
Named providers / custom_providers entries in config.yaml now accept an
extra_headers dict scoped to that endpoint — for reverse proxies, API
gateways, and custom auth schemes (e.g. Cloudflare Access service tokens).
- hermes_cli/config.py: normalize extra_headers on provider entries
(_normalize_custom_provider_entry + providers-dict translation), add
get_custom_provider_extra_headers /
apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on
base_url (case/trailing-slash insensitive, no substring bypass —
mirrors the TLS helpers)
- hermes_cli/runtime_provider.py: surface extra_headers in the resolved
runtime for named custom providers (providers dict, legacy
custom_providers list, and the credential-pool path)
- run_agent.py / agent/agent_init.py: merge per-provider extra_headers
onto the OpenAI client default_headers at construction and on every
_apply_client_headers_for_base_url re-application (credential swaps,
rebuilds), most-specific level wins; OpenAI-wire only (native
Anthropic/Bedrock scoped out)
- agent/auxiliary_client.py: accept model.extra_headers as an alias of
model.default_headers for the global variant
- cli-config.yaml.example: documented commented example
- Header values are treated as secrets and never logged
Salvaged from PR #3526 by @jneeee, reimplemented against current main.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:
- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
the main-client TLS resolution (via load_config_readonly, the read-only
fast path) so compression/vision/web_extract/title-gen/session_search
honor the same per-provider CA. Without this, chat worked against a
private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
ssl_verify edits are honored on mid-session model switch — matching the
context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
(httpx ignores it there); verify lives on the transport. Fix docstrings.
Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
match, and prefix-bypass rejection.
Wire ssl_ca_cert and ssl_verify through custom_providers config and env
vars into the keepalive httpx client, fixing APIConnectionError against
mkcert/self-signed Ollama proxies behind HTTPS.
The model could pass `toolsets` (top-level and per-task) to delegate_task,
letting it choose which toolsets a subagent got. Toolset selection is a
capability-scoping decision the model should not control; subagents inherit
the parent's enabled toolsets, period.
- Remove `toolsets` from the delegate_task() signature, the registry handler,
the top-level + per-task JSON schema, and the live dispatch path
(run_agent._dispatch_delegate_task — this forwarded it on every model call).
- Single-task and per-task child builds now pass toolsets=None so
_build_child_agent resolves to pure parent inheritance.
- Drop the now-dead _SUBAGENT_TOOLSETS / _TOOLSET_LIST_STR schema-hint block.
- _build_child_agent keeps its internal toolsets param + intersection helpers
(internal API; fed the inherited value only).
- Tests: schema assertions flipped to assertNotIn; added a regression test
proving the dispatch path never forwards a smuggled model `toolsets`.
- Docs: update delegate_task signature refs in the autonomous-ai-agents skill.
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).
- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
(5-min margin), ADC->service-account fallback, global vs regional
endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
credentials-file path is never mistaken for a static API key; mints a
fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
re-mints the token and rebuilds the client on a mid-session 401, so a
long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
runtime resolution.
Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
The persist user-message override was applied in place to the live messages
list. On the early crash-resilience persist (which runs BEFORE api_messages is
built), that stripped observed group-chat context off the live user message and
silently dropped it when observe_unmentioned_group_messages was enabled.
Fix at the single chokepoint: _flush_messages_to_session_db resolves the
override (idx/content/timestamp) locally and applies it ONLY to the row written
to the DB — the live dict is never mutated, so EVERY persist caller (early
persist, mid tool-loop flush, /resume, /branch) is protected uniformly. This
supersedes the earlier shallow-copy approach, which broke the intrinsic
_DB_PERSISTED_MARKER idempotency (copies never propagated the marker back to
the live dicts → duplicate rows) and closes the sibling class tracked in #56303.
Trailing empty-response scaffolding is still dropped from the live list in
_persist_session (unchanged behavior).
Salvaged from #48817; chokepoint reworked to coexist with the marker-based
dedup (#50372).
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The forked skill/memory review agent shares the parent's session_id for
prompt-cache warmth. Without isolation it wrote its harness turn ('Review the
conversation above and update the skill library…') plus its curator-mode reply
straight into the user's REAL session in state.db; the next live turn re-read
that injected user message as a standing instruction and the agent 'became' the
curator, refusing the actual task.
Root fix: a _persist_disabled flag on the fork that hard-stops every DB write
and lazy-open path (_flush_messages_to_session_db, _ensure_db_session,
_get_session_db_for_recall) — the review writes only to the skill/memory stores
via its tools. Defense-in-depth: _strip_background_review_harness drops any
stray harness message (and the assistant reply that followed) at load time in
get_messages_as_conversation, so an already-polluted session resumes clean.
Salvaged from #50296.
Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>
Phase 2c review follow-up on the id()-reuse persistence fix:
- test_recycled_id_in_dedup_set_still_persists_new_message seeded an EMPTY
dedup set, so it never injected a collision and passed under id-based dedup
too (couldn't distinguish the designs). Replace with
test_stale_seed_id_from_prior_flush_cannot_suppress_new_message, which asserts
the durable invariant: the seed is empty after every flush (mutation-checked:
removing the post-flush reset now fails BOTH id-reuse tests).
- Refresh the _flush_messages_to_session_db docstring: it still described the
old per-session identity tracking; document the intrinsic-marker mechanism,
that _flushed_db_message_ids is now a one-shot seed, and the shared-dict
mutation safety note.
_flush_messages_to_session_db deduped persisted messages with a retained
{id(msg)} set (_flushed_db_message_ids) kept across turns. Once a flushed dict
is dropped from the live list (scaffolding rewind / in-place compaction) and
GC'd, CPython recycles its address onto a new assistant/tool dict whose id()
collides with the stale entry — so the real turn is silently never written to
state.db.
Replace the retained id-set with an intrinsic _DB_PERSISTED_MARKER stamped on
each dict. The id-set is demoted to a one-shot seed (valid only while the
caller's objects are alive) that is translated to markers and cleared after
every flush, so no id() outlives a flush to alias a future message. The marker
is _-prefixed so the wire sanitizers strip it before any request leaves.
Preserves the existing _is_ephemeral_scaffolding skip. Salvaged from #50372.
Co-authored-by: rrevenanttt <290873280+rrevenanttt@users.noreply.github.com>
When text compression can't reduce a 413 request further, evict base64
image parts from tool messages and retry once instead of dead-ending
with 'Payload too large and cannot compress further.'
A 413 is a request-body byte-size limit, not a token limit. browser_vision
screenshots (2-5MB base64 each) keep the HTTP body oversized even after
aggressive summarization. The strip pass passes remember_model=False so a
413 does not poison _no_list_tool_content_models — that set is for providers
that reject list-type tool content, a distinct failure mode.
Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current
token-aware 413 recovery else branch.
verify_on_stop / pre_verify append a synthetic assistant "done" plus a
synthetic user nudge to keep the agent going one more turn before it can
claim completion. Both were flagged (_verification_stop_synthetic on the
nudge only), but the flags were never registered in
_EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding()
filter that guards both persistence sinks (SQLite flush + JSON snapshot)
let them through. The resumed transcript then inherited loop-only
scaffolding, invalidating the prompt-prefix cache on later turns.
- add _verification_stop_synthetic and _pre_verify_synthetic to
_EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use)
- flag the blocked attempt assistant message too, not just the nudge, so
the whole synthetic pair drops together and persistence does not keep a
premature done with the nudge stripped (assistant to assistant adjacency)
The API-payload leak claimed in the report is already handled: the
chat_completions transport strips every underscore-prefixed message key
before the wire, so the marker never reaches strict providers.
Reported by patppham.
Ephemeral empty-response/prefill recovery scaffolding (the synthetic
assistant "(empty)" turn, the user nudge, the terminal "(empty)"
sentinel, and the thinking-only prefill placeholder) exists only to
drive the next API retry; the in-memory loop pops it before appending
the real response. The append-only flush did not mirror that, so a
mid-turn persist could commit scaffolding to the SQLite session store
(and JSON log), and a resumed session would replay synthetic
"(empty)"/nudge turns as genuine context — re-poisoning the empty-retry
boundary forever.
Filter ephemeral scaffolding at both durable-write sites
(_flush_messages_to_session_db + _save_session_log), by flag not
position, so buried scaffolding (an answered nudge leaves the synthetic
pair mid-list) is skipped too. Covers all three flags including
_thinking_prefill.
Adapted onto current main's identity-tracking flush.
Cherry-picked from #41281 by petrichor-op.
The gateway/API server rebuilds the in-memory TodoStore by replaying
caller-supplied conversation_history. _hydrate_todo_store previously
accepted any role:tool message containing a "todos" array, so a forged
bare tool result could seed arbitrary todo state and re-inflate context
every turn (GHSA-5g4g-6jrg-mw3g).
Restrict hydration to tool results paired with an earlier assistant
todo tool call (matching tool_call_id, function name == todo, no
user/system boundary between). Reuse the existing _get_tool_call_id/
name_static helpers so dict- and object-shaped tool calls both work.
Add a generous MAX_TODO_RESULT_CHARS payload guard to drop absurd
forged results before parsing; item/content caps already exist on main.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Generic provider:custom relays were force-routed to the OpenAI Responses
API whenever the model matched gpt-5*, and a stale persisted
model.api_mode=codex_responses survived /reset and upgrades. Some
OpenAI-compatible relays do not implement Responses semantics, which
surfaced as malformed function_call.name replay errors in gateway sessions.
- runtime_provider: route custom-provider api_mode through
_resolve_plain_custom_api_mode(), which drops a stale codex_responses
unless the URL is direct OpenAI/xAI
- run_agent: _provider_model_requires_responses_api returns False for
custom; direct api.openai.com / api.x.ai URLs still upgrade via
_is_direct_openai_url() / URL detection
- regression coverage for plain relays vs direct OpenAI/xAI URLs
Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
A bare threading.Thread / ThreadPoolExecutor worker starts with an empty
contextvars.Context, so the context-local profile override
(_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process
multi-profile runtimes (desktop tui_gateway) the worker then resolves
get_hermes_home() to the launch/default profile, leaking one profile's
reads/writes into another. The fix primitive (tools.thread_context.
propagate_context_to_thread, which copies the parent context) already exists;
the leaking spawns simply did not use it.
- model_tools.py _run_async: wrap the worker-thread loop runner. This is the
generic sync->async bridge for every async tool, so wrapping it here fixes
the leak for all async tools at once (verified: an async tool reading
get_hermes_home() under an override now resolves the active profile).
- run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land
in the spawning turn's profile (#54937 path).
- tools/async_delegation.py: wrap both single + batch executor.submit calls so
detached children resolve the dispatching profile's paths.
Scope: the vision CPU executor is intentionally left unwrapped — it runs pure
in-memory encode/resize and never resolves profile-scoped paths.
The earlier enterprise base URL change (proxy-ep parsing) gave us URLs
like `api.enterprise.githubcopilot.com`, but ~15 host-matching call
sites still hard-coded `api.githubcopilot.com`. Enterprise users would
therefore drop the `Copilot-Integration-Id: vscode-chat` header at
client-build time, and upstream rejected requests with:
The requested model is not available for integrator "zed"
(or "copilot-language-server") — verify the correct
Copilot-Integration-Id header is being sent.
The header was correct in copilot_default_headers(); it just never
made it into default_headers for non-default hostnames because every
detector compared against the exact string "api.githubcopilot.com".
This commit broadens all those checks to "githubcopilot.com" via
base_url_host_matches (which already does proper subdomain matching),
so api.enterprise.githubcopilot.com, api.business.githubcopilot.com,
etc. all share the same headers, vision routing, max_completion_tokens
selection, and reasoning-effort detection as the default endpoint.
Also adds ".githubcopilot.com" to _URL_TO_PROVIDER so context-window
resolution via models.dev works for enterprise base URLs, and tightens
_is_github_copilot_url to use suffix matching instead of strict equality.
Tests:
- New: enterprise Copilot endpoint preserves Copilot-Integration-Id
- New: enterprise endpoint returns max_completion_tokens (not max_tokens)
- Existing 333 base_url / copilot / aux-client / credential-pool tests pass
Parts 5 of #7731.
_sanitize_api_messages() compared raw tool_call_id strings without
stripping whitespace. When assistant-side IDs and tool-result IDs
diverged due to surrounding whitespace, valid tool results were treated
as orphaned and replaced with [Result unavailable] stub placeholders.
Strip whitespace in _get_tool_call_id_static() (both call_id/id paths,
dict and object) and at the two result_call_id comparison sites in
sanitize_api_messages(). Adds regression tests for preserved-whitespace
results and orphaned-whitespace removal.
Closes#9999
Defense-in-depth on top of _safe_session_filename_component (#5958):
Sink (makes the bad write impossible regardless of entry point):
- run_agent._save_session_log: sanitize session_id before building the
session_{sid}.json snapshot path.
- agent_runtime_helpers.dump_api_request_debug: sanitize before building
the request_dump_{sid}_{ts}.json path.
Boundary (clean 400 instead of a silently-hashed filename):
- api_server rejects path-traversal-shaped X-Hermes-Session-Id on the
session-continuation path and the explicit /api/sessions create path,
reusing gateway.session._is_path_unsafe (mirrors the native gateway's
entry-boundary guard). Also enforces the session-header length cap on
the continuation path.
Tests: traversal session_id stays contained at the write site; sanitizer
always yields a traversal-free segment; the API header rejects
../, absolute, and Windows-traversal IDs with 400.
Session IDs can originate from untrusted input (e.g. the
X-Hermes-Session-Id API header) and are interpolated raw into on-disk
artifact filenames under ~/.hermes/sessions/. A traversal-shaped ID
(../../../../etc/pwned) would let a caller write the session snapshot
or request dump outside the sessions directory.
_safe_session_filename_component() collapses every non [A-Za-z0-9_-]
character to _, caps the length, and appends a short content hash when
sanitization changed the string, always yielding a single traversal-free
path segment.
Closes#5958.
Two independent agent-loop hardening fixes:
- anthropic: when the streaming loop breaks on _interrupt_requested,
return None instead of calling stream.get_final_message() on the
partially-drained stream — the SDK may hang draining remaining events
or return a Message with incomplete tool_use blocks. The outer poll
loop raises InterruptedError, so the return value is discarded anyway.
- vision: add a 20 MB cap on base64 data-URL payloads before
base64.b64decode() in _materialize_data_url_for_vision. A 100MB+
payload creates ~275MB of memory pressure; gateway users sharing the
process can trivially OOM it. Oversized payloads return ("", None).
The third change from the original PR (streaming tool-name += to
assignment dedup) was already landed independently on main.
Co-authored-by: aaronlab <1115117931@qq.com>
When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}),
`_summarize_api_error` fell through to a bare `str(error)`, so users saw only
"HTTP 400" with no provider detail (reported on Windows in #36109). The SDK
leaves `body` empty in this case, but the httpx `response` still carries the
payload in `.text`.
- run_agent.py `_summarize_api_error`: when `body` is empty, fall back to
`response.text` — parse a JSON `error.message`/`message` when present, else
surface the raw (truncated) body. Platform-agnostic diagnostics.
- hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns
exit code 2 when the run is failed/partial with no usable final response, so
scripts can detect LLM failures (still 0 when a response — incl. an error
summary as output — is produced).
Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw
text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests.
NOTE: #36109's original root cause (Windows "all providers return empty 400")
is not reproducible on current main (heavy provider-transport churn since
v0.15.1). This change does not claim to fix that root cause — it makes any
empty-body API error LEGIBLE so a future occurrence shows the real provider
message instead of a bare HTTP 400. Relates to #36109 (does not close it).
Make check-and-replace atomic in _ensure_primary_openai_client by
keeping both operations under the same lock acquisition. Previously,
the lock was released between detecting a closed client and replacing
it, allowing two threads to simultaneously replace the client.
Fixes#32846
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(agent): config-driven intent-ack continuation for all api_modes (#27881)
The agent could end a turn after only stating intent ('I will run a health
check...') without executing the announced tool call, forcing the user to
re-prompt. A continuation guard that catches this and nudges the model to
proceed already existed but was hard-gated to the codex_responses api_mode,
so Gemini/Claude/OpenRouter turns never benefited.
- New agent.intent_ack_continuation config (default 'auto' = codex-only,
byte-stable for existing conversations). 'true'/model-list opts every
api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape.
- looks_like_codex_intermediate_ack gains require_workspace (default True).
The opted-in path drops the codebase/filesystem requirement so general
autonomous workflows (server ops, deploys, API calls) are caught, not just
coding tasks. Future-ack + action-verb + short-content + no-prior-tool
guards still apply; the 2-nudge-per-turn cap is unchanged.
- Resolution centralized in intent_ack_continuation_mode (off/codex_only/all).
* docs(infographic): intent-ack continuation (#27881)
Follow-up to the #13971 fix: a genuine native Ollama provider reached
through a reverse proxy carries no ollama/:11434 URL signature, so the
restricted detection would miss it. Add provider=="ollama" as an
explicit True case (idea from #14789, @Tranquil-Flow) and cover both it
and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.
When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.
Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.
Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.
This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
spurious continuation messages into the conversation — strictly worse
Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Wire get_reasoning_stale_timeout_floor() into both stale detectors so known
reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek
R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of
the upstream gateway idle-killing the socket (BrokenPipeError) before first
token. Applied as max(default, floor) — never overrides explicit user config,
never lowers an existing threshold.
The reasoning_timeouts.py allowlist module already landed on main via #52795,
so this salvage carries only the wiring + tests (the duplicate module and the
stale-base MoA reverts from the original PR branch are dropped).
Salvaged from #52238. Fixes#52217.
* feat(moa): expose MoA presets as selectable virtual models
Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.
Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.
- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
/ one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.
* fix(moa): thread moa_config through _run_agent to _run_agent_inner
The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).
* fix(moa): classify moa as a virtual provider in the catalog
The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
- test_provider_catalog: api_key providers must expose a credential env var
- test_provider_parity: every hermes-model provider must be desktop-configurable
moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers
Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632
Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.
* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed
The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.
end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.
Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.
Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.
Closes#12029.
* feat(delegation): single-task delegate_task always runs in the background
The model no longer decides whether a subagent runs in the background — a
single-task delegate_task from the top-level agent is now always dispatched
async, so the parent turn returns immediately and the subagent's result
re-enters the conversation when it finishes.
- run_agent._dispatch_delegate_task (the live model path) forces
background=True for top-level single-task calls; the schema-level
`background` param is ignored.
- A batch (tasks with >1 item) stays synchronous (fan-out can't go async).
- A delegation from an orchestrator subagent (depth > 0) stays synchronous —
it needs its workers' results within its own turn.
- The function-level default is unchanged, so direct Python callers/tests keep
the historical synchronous behavior.
- On async-pool capacity rejection, single-task now falls through to a
synchronous run instead of erroring (the child stays attached for interrupt
propagation; detach happens only on a successful dispatch).
- Schema `background` param marked deprecated/ignored; tool description
updated to state the always-background single-task rule.
* feat(delegation): all delegate_task fan-out runs in the background
Extend the always-background behavior to the full fan-out. A batch is now
dispatched as N independent async subagents (one handle each), instead of
running synchronously. Single task and batch both return immediately; each
subagent's result re-enters the conversation as its own message when it
finishes.
- delegate_task: when background is set, loop over ALL built children and
dispatch each via dispatch_async_delegation; return a combined handle block
(count + per-task delegation_ids). Children the async pool rejects (at
capacity) run synchronously inline and are reported alongside the dispatched
handles, so nothing is silently dropped.
- run_agent._dispatch_delegate_task + registry handler: force background for
any top-level model delegation (single OR batch); orchestrator subagents
(depth > 0) still run synchronously since they need workers' results within
their own turn.
- Removed the v1 'batch async not supported' rejection.
- Tool description updated: BOTH MODES RUN IN THE BACKGROUND.
- Tests updated to assert batch fan-out dispatches each task async (verified
E2E: 3-task batch -> 3 independent completion-queue events).
* fix(delegation): background fan-out joins and returns one consolidated block
Correct the fan-out semantics: a backgrounded batch is dispatched as ONE
async unit (one handle, one async-pool slot), not N independent dispatches.
The unit runs all children in parallel, waits on every one, and emits a
SINGLE completion event carrying the consolidated per-task results. The chat
is never blocked; when all subagents finish, their full summaries re-enter
the conversation together as one message.
- async_delegation.dispatch_async_delegation_batch + _finalize_batch: a batch
occupies one slot; its runner returns the combined {results:[...]} dict and
one event with the full results list is pushed to the completion queue.
- delegate_tool: extract the sync execution+aggregation into
_execute_and_aggregate(); background dispatches it via the batch unit and
returns one handle; on pool-capacity rejection it runs the batch inline.
- process_registry._format_async_delegation: render a consolidated multi-task
block (TASK i/N + per-task summary) when the event carries is_batch/results.
- Tests updated; E2E verified: 3-task batch -> immediate return -> one combined
completion block with all three summaries.
Widen the env_float() guard from #48735 across the whole bug class: a
non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd
port) raised an unhandled ValueError and crashed adapter/agent init.
Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to
the canonical utils.env_int / utils.env_float helpers (the established house
pattern), instead of duplicating per-module helpers or inline try/except:
- gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT
- gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL
- gateway/platforms/feishu.py: dedup cache + text/media batch settings
- gateway/platforms/wecom.py, discord/adapter.py: text batch delays
- gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT
- gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT
- hermes_cli/auth.py: CODEX/XAI refresh timeouts
- agent/chat_completion_helpers.py: API/stream read/stale timeouts
- run_agent.py, agent/auxiliary_client.py: API + nous timeouts
Sites already guarded by try/except or local helpers are left untouched.
The HERMES_MAX_ITERATIONS sites are already guarded on main via
_current_max_iterations(), so they are not included.
Makes the CLI memory-provider shutdown path observable: log when CLI
cleanup calls memory shutdown (with session id + message count), warn
instead of swallowing CLI memory-shutdown exceptions, warn on
on_session_end failures during agent shutdown, and raise the
MemoryManager provider-hook failure log from debug to warning with a
traceback.
Salvaged from PR #49287 (authored by Gille / @helix4u).
* docs: clean up three stale comments from the #32848 audit
- tools/memory_tool.py:20 — 'read' action was intentionally removed
but the docstring still listed it. Now matches the schema.
- tools/fuzzy_match.py:9 — unicode_normalized was added but the
chain-count docstring still said '8-strategy'. Now says '9'.
- run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in.
Replaced with a backfill note.
Fixes#32848 (parts 3, 4, and 12)
* docs(memory): also remove stray memory(action=read) references in lines 144 and 201
The original #32848 audit fix (in 6fd661d6) only addressed line 20
(the action list in the module docstring), but the action was
referenced in two other places:
- tools/memory_tool.py:144 — in a class docstring, claimed
'memory(action=read)' was a way to SEE poisoned entries
- tools/memory_tool.py:201 — in a user-facing warning message,
told the user to 'use memory(action=read) to inspect'
Since the schema on line 683 only allows add/replace/remove, both
references were misleading: the first claimed a way to inspect
poisoned entries that doesn't exist, the second would error out
when the user followed the warning.
This commit removes both references:
- Line 144: '...keep the original text so the user can still SEE
poisoned entries by inspecting the source files directly, and
remove them — silently dropping them would hide the attack
from the user.'
- Line 201: '...use memory(action=remove) to delete the
original. (drop the read-action reference)'
Followup to the previous commit on this branch.
---------
Co-authored-by: KeyArgo <keyargo@argobox.com>
Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242
Consolidates these related Amy fork patches:
- 429830f39 feat(gateway): inject message timestamps into user messages for LLM context
- 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay
- 2874f7725 feat: human-friendly timestamp format with weekday and timezone name
- 3735f4c8b fix: render gateway message timestamps once
Follow-up to salvaged PR #41624:
- Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused)
- Add tests: session:compress emits with correct context, no-callback is
safe, and a callback exception does not break compression
When display.memory_notifications is set to 'verbose', skill_manage
notifications now show meaningful change details instead of just the
generic tool message.
Before (verbose mode):
💾📝 Patched SKILL.md in skill 'gogcli' (1 replacement).
After (verbose mode):
💾📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..."
Changes:
- skill_manager_tool.py: _patch_skill() now includes old/new string
previews (truncated to 200 chars) in the result via '_change' key.
_create_skill() and _edit_skill() include skill description from
frontmatter for verbose create/edit notifications.
- run_agent.py: Background review notification builder now reads the
'_change' dict from skill tool results and formats descriptive
notifications per action type (patch → old→new diff, create/edit →
description preview). Falls back to generic message when _change
data is unavailable (backwards compatible).
This is especially useful when subagents patch skills, since neither
the user nor the parent agent can see what the subagent changed.
* fix(skills): guard recursive skill delete against tree-escape
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
* fix(delegation): forward background flag in delegate_task dispatch
delegate_task is an _AGENT_LOOP_TOOLS member, so every surface (CLI,
gateway, desktop/TUI) routes it through AIAgent._dispatch_delegate_task.
That forwarder passed every schema field except background, so
delegate_task(background=true) was silently downgraded to a synchronous
run and returned the sync results payload instead of a delegation_id.
The model sees background in the schema (the call validates), but the
value never reached the function. Add the one missing kwarg so async
background delegation actually engages.
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.
Co-authored-by: goku94123 <gooku94123@gmail.com>