hermes-agent

Author	SHA1	Message	Date
Jaaneek	5ef0b8acb0	feat(auth): make xAI Grok OAuth device-code-only, drop loopback login Replace the loopback/PKCE-callback server and manual-paste fallback with the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The flow works in headless/SSH/container sessions with no 127.0.0.1 listener, shrinking the local attack surface. - Poll the token endpoint with server-provided interval, honoring slow_down and expires_in; store tokens with auth_mode oauth_device_code. - Adaptive proactive refresh skew for short-lived device-code JWTs; rotated tokens sync back to auth.json, the global root store, and the credential pool (no refresh-token replay). - Clear source suppression on successful re-login (CLI + dashboard) and drop the duplicate dashboard pool entry so exactly one seeded device_code entry exists. - Use the shared device_code source name for consistency with the nous/codex device-code providers. - Desktop: remove the loopback OAuth flow states and dead type variants; pkce providers' sign-in URL selection is unchanged. - Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted --manual-paste flag from documented commands.	2026-07-02 13:17:41 -07:00
Jneeee	b98baa3039	feat(config): extra HTTP headers for LLM API calls (#3526 salvage) Named providers / custom_providers entries in config.yaml now accept an extra_headers dict scoped to that endpoint — for reverse proxies, API gateways, and custom auth schemes (e.g. Cloudflare Access service tokens). - hermes_cli/config.py: normalize extra_headers on provider entries (_normalize_custom_provider_entry + providers-dict translation), add get_custom_provider_extra_headers / apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on base_url (case/trailing-slash insensitive, no substring bypass — mirrors the TLS helpers) - hermes_cli/runtime_provider.py: surface extra_headers in the resolved runtime for named custom providers (providers dict, legacy custom_providers list, and the credential-pool path) - run_agent.py / agent/agent_init.py: merge per-provider extra_headers onto the OpenAI client default_headers at construction and on every _apply_client_headers_for_base_url re-application (credential swaps, rebuilds), most-specific level wins; OpenAI-wire only (native Anthropic/Bedrock scoped out) - agent/auxiliary_client.py: accept model.extra_headers as an alias of model.default_headers for the global variant - cli-config.yaml.example: documented commented example - Header values are treated as secrets and never logged Salvaged from PR #3526 by @jneeee, reimplemented against current main. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-07-02 05:33:25 -07:00
kshitijk4poor	676236bb1d	fix(agent): honor custom CA certs on aux client + harden TLS resolution The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up: - Auxiliary client parity: process_bootstrap.build_keepalive_http_client accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors the main-client TLS resolution (via load_config_readonly, the read-only fast path) so compression/vision/web_extract/title-gen/session_search honor the same per-provider CA. Without this, chat worked against a private-CA endpoint but every auxiliary call still failed APIConnectionError. - switch_model now reads custom_providers from live config (load_config_readonly) instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert / ssl_verify edits are honored on mid-session model switch — matching the context-length reload (#15779). - Drop the dead client-level verify= where a custom httpx transport is used (httpx ignores it there); verify lives on the transport. Fix docstrings. Applies to both run_agent._build_keepalive_http_client and process_bootstrap. - resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming the endpoint whenever ssl_verify:false disables verification. - get_custom_provider_tls_settings: case-insensitive base_url match (config dedup already lowercases; scheme/host are case-insensitive) so a mixed-case entry doesn't silently drop its CA. Exact match preserved — no prefix bypass. - Demote best-effort except Exception: pass in agent_init/switch_model to logger.debug(exc_info=True). - Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive match, and prefix-bypass rejection.	2026-07-02 04:51:56 +05:30
HexLab98	3a2ba959ce	fix(agent): honor custom CA certs for custom_providers HTTPS endpoints Wire ssl_ca_cert and ssl_verify through custom_providers config and env vars into the keepalive httpx client, fixing APIConnectionError against mkcert/self-signed Ollama proxies behind HTTPS.	2026-07-02 04:51:56 +05:30
Teknium	ba0bc01d1f	feat(delegate): remove model-facing toolsets arg — subagents always inherit parent's (#56386 ) The model could pass `toolsets` (top-level and per-task) to delegate_task, letting it choose which toolsets a subagent got. Toolset selection is a capability-scoping decision the model should not control; subagents inherit the parent's enabled toolsets, period. - Remove `toolsets` from the delegate_task() signature, the registry handler, the top-level + per-task JSON schema, and the live dispatch path (run_agent._dispatch_delegate_task — this forwarded it on every model call). - Single-task and per-task child builds now pass toolsets=None so _build_child_agent resolves to pure parent inheritance. - Drop the now-dead _SUBAGENT_TOOLSETS / _TOOLSET_LIST_STR schema-hint block. - _build_child_agent keeps its internal toolsets param + intersection helpers (internal API; fed the inherited value only). - Tests: schema assertions flipped to assertNotIn; added a regression test proving the dispatch path never forwards a smuggled model `toolsets`. - Docs: update delegate_task signature refs in the autonomous-ai-agents skill.	2026-07-01 05:35:26 -07:00
Steve Lawton	c73e74386b	feat(vertex): add Google Vertex AI provider for Gemini (OAuth2) Adds Vertex AI as a first-class provider for Gemini models via Vertex's OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2 access tokens (service-account JSON or ADC), not a static API key — the missing piece behind the recurring requests (#13484, #12639, #56259). - agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry (5-min margin), ADC->service-account fallback, global vs regional endpoint URLs. Config precedence: env var > config.yaml > default. - plugins/model-providers/vertex/: provider profile (auth_type=vertex), reuses Gemini's extra_body.google.thinking_config translation. - runtime_provider: vertex short-circuit BEFORE the credential pool so a credentials-file path is never mistaken for a static API key; mints a fresh token + computes base_url per resolve. - run_agent + conversation_loop: _try_refresh_vertex_client_credentials() re-mints the token and rebuilds the client on a mid-session 401, so a long-lived gateway agent survives token expiry (~1h). - auxiliary_client: vertex auth_type branch for side-LLM tasks. - config.yaml: vertex.project_id / vertex.region (non-secret, bridged to env); credential path stays in .env (VERTEX_CREDENTIALS_PATH). - setup wizard + model picker: dedicated _model_flow_vertex; curated google/gemini-* model list; --provider choices. - pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint host auto-maps to the vertex provider (no probe spam). - lazy_deps + pyproject [vertex] extra: google-auth, opt-in only. - docs: guides/google-vertex.md + providers page; tests for adapter + runtime resolution. Salvages and modernizes #8427 by @slawt onto current main: rewired from the legacy PROVIDER_REGISTRY path to the provider-profile architecture, moved non-secret config out of .env into config.yaml, and added the per-turn 401 token-refresh the original lacked.	2026-07-01 05:25:33 -07:00
kyssta-exe	7eb9716ad7	fix(agent): apply persist override to the DB row only, never the live list (#48677 ) The persist user-message override was applied in place to the live messages list. On the early crash-resilience persist (which runs BEFORE api_messages is built), that stripped observed group-chat context off the live user message and silently dropped it when observe_unmentioned_group_messages was enabled. Fix at the single chokepoint: _flush_messages_to_session_db resolves the override (idx/content/timestamp) locally and applies it ONLY to the row written to the DB — the live dict is never mutated, so EVERY persist caller (early persist, mid tool-loop flush, /resume, /branch) is protected uniformly. This supersedes the earlier shallow-copy approach, which broke the intrinsic _DB_PERSISTED_MARKER idempotency (copies never propagated the marker back to the live dicts → duplicate rows) and closes the sibling class tracked in #56303. Trailing empty-response scaffolding is still dropped from the live list in _persist_session (unchanged behavior). Salvaged from #48817; chokepoint reworked to coexist with the marker-based dedup (#50372). Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-07-01 17:28:04 +05:30
arminanton	e2fa509bf3	fix(review): isolate the background-review fork from the canonical session The forked skill/memory review agent shares the parent's session_id for prompt-cache warmth. Without isolation it wrote its harness turn ('Review the conversation above and update the skill library…') plus its curator-mode reply straight into the user's REAL session in state.db; the next live turn re-read that injected user message as a standing instruction and the agent 'became' the curator, refusing the actual task. Root fix: a _persist_disabled flag on the fork that hard-stops every DB write and lazy-open path (_flush_messages_to_session_db, _ensure_db_session, _get_session_db_for_recall) — the review writes only to the skill/memory stores via its tools. Defense-in-depth: _strip_background_review_harness drops any stray harness message (and the assistant reply that followed) at load time in get_messages_as_conversation, so an already-polluted session resumes clean. Salvaged from #50296. Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>	2026-07-01 16:21:39 +05:30
kshitijk4poor	d3010b74db	test(agent): strengthen id-reuse regression + refresh flush docstring (review) Phase 2c review follow-up on the id()-reuse persistence fix: - test_recycled_id_in_dedup_set_still_persists_new_message seeded an EMPTY dedup set, so it never injected a collision and passed under id-based dedup too (couldn't distinguish the designs). Replace with test_stale_seed_id_from_prior_flush_cannot_suppress_new_message, which asserts the durable invariant: the seed is empty after every flush (mutation-checked: removing the post-flush reset now fails BOTH id-reuse tests). - Refresh the _flush_messages_to_session_db docstring: it still described the old per-session identity tracking; document the intrinsic-marker mechanism, that _flushed_db_message_ids is now a one-shot seed, and the shared-dict mutation safety note.	2026-07-01 16:17:46 +05:30
rrevenanttt	e4c6d1b22b	fix(agent): persist messages by intrinsic marker to stop id() reuse data loss _flush_messages_to_session_db deduped persisted messages with a retained {id(msg)} set (_flushed_db_message_ids) kept across turns. Once a flushed dict is dropped from the live list (scaffolding rewind / in-place compaction) and GC'd, CPython recycles its address onto a new assistant/tool dict whose id() collides with the stale entry — so the real turn is silently never written to state.db. Replace the retained id-set with an intrinsic _DB_PERSISTED_MARKER stamped on each dict. The id-set is demoted to a one-shot seed (valid only while the caller's objects are alive) that is translated to markers and cleared after every flush, so no id() outlives a flush to alias a future message. The marker is _-prefixed so the wire sanitizers strip it before any request leaves. Preserves the existing _is_ephemeral_scaffolding skip. Salvaged from #50372. Co-authored-by: rrevenanttt <290873280+rrevenanttt@users.noreply.github.com>	2026-07-01 16:17:46 +05:30
Tranquil-Flow	122e5bc037	fix(agent): retry 413 after stripping vision payloads (#47339 ) When text compression can't reduce a 413 request further, evict base64 image parts from tool messages and retry once instead of dead-ending with 'Payload too large and cannot compress further.' A 413 is a request-body byte-size limit, not a token limit. browser_vision screenshots (2-5MB base64 each) keep the HTTP body oversized even after aggressive summarization. The strip pass passes remember_model=False so a 413 does not poison _no_list_tool_content_models — that set is for providers that reject list-type tool content, a distinct failure mode. Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current token-aware 413 recovery else branch.	2026-07-01 03:18:41 -07:00
Teknium	913e661a09	fix(cache): stop verification-loop synthetic nudges from persisting (#56194 ) verify_on_stop / pre_verify append a synthetic assistant "done" plus a synthetic user nudge to keep the agent going one more turn before it can claim completion. Both were flagged (_verification_stop_synthetic on the nudge only), but the flags were never registered in _EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding() filter that guards both persistence sinks (SQLite flush + JSON snapshot) let them through. The resumed transcript then inherited loop-only scaffolding, invalidating the prompt-prefix cache on later turns. - add _verification_stop_synthetic and _pre_verify_synthetic to _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use) - flag the blocked attempt assistant message too, not just the nudge, so the whole synthetic pair drops together and persistence does not keep a premature done with the nudge stripped (assistant to assistant adjacency) The API-payload leak claimed in the report is already handled: the chat_completions transport strips every underscore-prefixed message key before the wire, so the marker never reaches strict providers. Reported by patppham.	2026-07-01 02:26:06 -07:00
petrichor-op	f2a528fb59	fix(agent): never persist empty-response recovery scaffolding Ephemeral empty-response/prefill recovery scaffolding (the synthetic assistant "(empty)" turn, the user nudge, the terminal "(empty)" sentinel, and the thinking-only prefill placeholder) exists only to drive the next API retry; the in-memory loop pops it before appending the real response. The append-only flush did not mirror that, so a mid-turn persist could commit scaffolding to the SQLite session store (and JSON log), and a resumed session would replay synthetic "(empty)"/nudge turns as genuine context — re-poisoning the empty-retry boundary forever. Filter ephemeral scaffolding at both durable-write sites (_flush_messages_to_session_db + _save_session_log), by flag not position, so buried scaffolding (an answered nudge leaves the synthetic pair mid-list) is skipped too. Covers all three flags including _thinking_prefill. Adapted onto current main's identity-tracking flush. Cherry-picked from #41281 by petrichor-op.	2026-07-01 01:08:27 -07:00
峯岸亮	bc6cd46925	fix(agent): restrict todo hydration to paired assistant todo calls The gateway/API server rebuilds the in-memory TodoStore by replaying caller-supplied conversation_history. _hydrate_todo_store previously accepted any role:tool message containing a "todos" array, so a forged bare tool result could seed arbitrary todo state and re-inflate context every turn (GHSA-5g4g-6jrg-mw3g). Restrict hydration to tool results paired with an earlier assistant todo tool call (matching tool_call_id, function name == todo, no user/system boundary between). Reuse the existing _get_tool_call_id/ name_static helpers so dict- and object-shaped tool calls both work. Add a generous MAX_TODO_RESULT_CHARS payload guard to drop absurd forged results before parsing; item/content caps already exist on main. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-07-01 01:02:17 -07:00
HiddenPuppy	0e4c879a3b	fix: keep plain custom GPT-5 relays on chat completions Generic provider:custom relays were force-routed to the OpenAI Responses API whenever the model matched gpt-5*, and a stale persisted model.api_mode=codex_responses survived /reset and upgrades. Some OpenAI-compatible relays do not implement Responses semantics, which surfaced as malformed function_call.name replay errors in gateway sessions. - runtime_provider: route custom-provider api_mode through _resolve_plain_custom_api_mode(), which drops a stale codex_responses unless the URL is direct OpenAI/xAI - run_agent: _provider_model_requires_responses_api returns False for custom; direct api.openai.com / api.x.ai URLs still upgrade via _is_direct_openai_url() / URL detection - regression coverage for plain relays vs direct OpenAI/xAI URLs Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>	2026-06-30 15:57:52 -07:00
Erosika	00eefc7f2b	style(profile): frame comments around what the code does	2026-06-30 15:30:06 -07:00
Erosika	a6175d1f93	style(profile): trim verbose comments to one or two lines	2026-06-30 15:30:06 -07:00
Erosika	09af0a8c1d	fix(profile): propagate profile context across thread/executor boundaries A bare threading.Thread / ThreadPoolExecutor worker starts with an empty contextvars.Context, so the context-local profile override (_HERMES_HOME_OVERRIDE) does not cross the spawn boundary. In single-process multi-profile runtimes (desktop tui_gateway) the worker then resolves get_hermes_home() to the launch/default profile, leaking one profile's reads/writes into another. The fix primitive (tools.thread_context. propagate_context_to_thread, which copies the parent context) already exists; the leaking spawns simply did not use it. - model_tools.py _run_async: wrap the worker-thread loop runner. This is the generic sync->async bridge for every async tool, so wrapping it here fixes the leak for all async tools at once (verified: an async tool reading get_hermes_home() under an override now resolves the active profile). - run_agent.py bg-review thread: wrap so MEMORY.md / skill review writes land in the spawning turn's profile (#54937 path). - tools/async_delegation.py: wrap both single + batch executor.submit calls so detached children resolve the dispatching profile's paths. Scope: the vision CPU executor is intentionally left unwrapped — it runs pure in-memory encode/resize and never resolves profile-scoped paths.	2026-06-30 15:30:06 -07:00
NiuNiu Xia	fb07215844	fix(copilot): recognize enterprise subdomains in host checks The earlier enterprise base URL change (proxy-ep parsing) gave us URLs like `api.enterprise.githubcopilot.com`, but ~15 host-matching call sites still hard-coded `api.githubcopilot.com`. Enterprise users would therefore drop the `Copilot-Integration-Id: vscode-chat` header at client-build time, and upstream rejected requests with: The requested model is not available for integrator "zed" (or "copilot-language-server") — verify the correct Copilot-Integration-Id header is being sent. The header was correct in copilot_default_headers(); it just never made it into default_headers for non-default hostnames because every detector compared against the exact string "api.githubcopilot.com". This commit broadens all those checks to "githubcopilot.com" via base_url_host_matches (which already does proper subdomain matching), so api.enterprise.githubcopilot.com, api.business.githubcopilot.com, etc. all share the same headers, vision routing, max_completion_tokens selection, and reasoning-effort detection as the default endpoint. Also adds ".githubcopilot.com" to _URL_TO_PROVIDER so context-window resolution via models.dev works for enterprise base URLs, and tightens _is_github_copilot_url to use suffix matching instead of strict equality. Tests: - New: enterprise Copilot endpoint preserves Copilot-Integration-Id - New: enterprise endpoint returns max_completion_tokens (not max_tokens) - Existing 333 base_url / copilot / aux-client / credential-pool tests pass Parts 5 of #7731.	2026-06-30 03:27:41 -07:00
nightq	fa3ab2ffd0	fix: normalize tool_call_id whitespace in sanitizer _sanitize_api_messages() compared raw tool_call_id strings without stripping whitespace. When assistant-side IDs and tool-result IDs diverged due to surrounding whitespace, valid tool results were treated as orphaned and replaced with [Result unavailable] stub placeholders. Strip whitespace in _get_tool_call_id_static() (both call_id/id paths, dict and object) and at the two result_call_id comparison sites in sanitize_api_messages(). Adds regression tests for preserved-whitespace results and orphaned-whitespace removal. Closes #9999	2026-06-30 01:43:40 -07:00
Rod Boev	6fd701acbe	fix(agent): keep cooldown state on the active session (#54465 )	2026-06-30 13:36:29 +05:30
teknium1	ea1372d2af	fix(security): wire session-id sanitizer into artifact paths + API boundary Defense-in-depth on top of _safe_session_filename_component (#5958): Sink (makes the bad write impossible regardless of entry point): - run_agent._save_session_log: sanitize session_id before building the session_{sid}.json snapshot path. - agent_runtime_helpers.dump_api_request_debug: sanitize before building the request_dump_{sid}_{ts}.json path. Boundary (clean 400 instead of a silently-hashed filename): - api_server rejects path-traversal-shaped X-Hermes-Session-Id on the session-continuation path and the explicit /api/sessions create path, reusing gateway.session._is_path_unsafe (mirrors the native gateway's entry-boundary guard). Also enforces the session-header length cap on the continuation path. Tests: traversal session_id stays contained at the write site; sanitizer always yields a traversal-free segment; the API header rejects ../, absolute, and Windows-traversal IDs with 400.	2026-06-29 04:25:45 -07:00
Xowiek	1debd5e8f9	fix(security): add session-id filename sanitizer to prevent path traversal Session IDs can originate from untrusted input (e.g. the X-Hermes-Session-Id API header) and are interpolated raw into on-disk artifact filenames under ~/.hermes/sessions/. A traversal-shaped ID (../../../../etc/pwned) would let a caller write the session snapshot or request dump outside the sessions directory. _safe_session_filename_component() collapses every non [A-Za-z0-9_-] character to _, caps the length, and appends a short content hash when sanitization changed the string, always yielding a single traversal-free path segment. Closes #5958.	2026-06-29 04:25:45 -07:00
aaronlab	ec148f5d31	fix(agent): guard Anthropic interrupt, cap vision data-URL size Two independent agent-loop hardening fixes: - anthropic: when the streaming loop breaks on _interrupt_requested, return None instead of calling stream.get_final_message() on the partially-drained stream — the SDK may hang draining remaining events or return a Message with incomplete tool_use blocks. The outer poll loop raises InterruptedError, so the return value is discarded anyway. - vision: add a 20 MB cap on base64 data-URL payloads before base64.b64decode() in _materialize_data_url_for_vision. A 100MB+ payload creates ~275MB of memory pressure; gateway users sharing the process can trivially OOM it. Oversized payloads return ("", None). The third change from the original PR (streaming tool-name += to assignment dedup) was already landed independently on main. Co-authored-by: aaronlab <1115117931@qq.com>	2026-06-28 18:53:20 -07:00
xxxigm	093f567f0d	fix(agent,cli): surface empty-body API errors and fail oneshot exit code When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}), `_summarize_api_error` fell through to a bare `str(error)`, so users saw only "HTTP 400" with no provider detail (reported on Windows in #36109). The SDK leaves `body` empty in this case, but the httpx `response` still carries the payload in `.text`. - run_agent.py `_summarize_api_error`: when `body` is empty, fall back to `response.text` — parse a JSON `error.message`/`message` when present, else surface the raw (truncated) body. Platform-agnostic diagnostics. - hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns exit code 2 when the run is failed/partial with no usable final response, so scripts can detect LLM failures (still 0 when a response — incl. an error summary as output — is produced). Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests. NOTE: #36109's original root cause (Windows "all providers return empty 400") is not reproducible on current main (heavy provider-transport churn since v0.15.1). This change does not claim to fix that root cause — it makes any empty-body API error LEGIBLE so a future occurrence shows the real provider message instead of a bare HTTP 400. Relates to #36109 (does not close it).	2026-06-28 02:05:20 -07:00
kurlyk	def97bcd96	fix: eliminate race condition in OpenAI client replacement Make check-and-replace atomic in _ensure_primary_openai_client by keeping both operations under the same lock acquisition. Previously, the lock was released between detecting a closed client and replacing it, allowing two threads to simultaneously replace the client. Fixes #32846 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-28 01:08:04 -07:00
Teknium	d43e0cf304	fix(agent): config-driven intent-ack continuation for all api_modes (#27881 ) (#53943 ) * fix(agent): config-driven intent-ack continuation for all api_modes (#27881) The agent could end a turn after only stating intent ('I will run a health check...') without executing the announced tool call, forcing the user to re-prompt. A continuation guard that catches this and nudges the model to proceed already existed but was hard-gated to the codex_responses api_mode, so Gemini/Claude/OpenRouter turns never benefited. - New agent.intent_ack_continuation config (default 'auto' = codex-only, byte-stable for existing conversations). 'true'/model-list opts every api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape. - looks_like_codex_intermediate_ack gains require_workspace (default True). The opted-in path drops the codebase/filesystem requirement so general autonomous workflows (server ops, deploys, API calls) are caught, not just coding tasks. Future-ack + action-verb + short-content + no-prior-tool guards still apply; the 2-nudge-per-turn cap is unchanged. - Resolution centralized in intent_ack_continuation_mode (off/codex_only/all). * docs(infographic): intent-ack continuation (#27881)	2026-06-27 20:46:00 -07:00
teknium1	f062cf076b	fix(agent): also treat provider=ollama as an Ollama GLM backend Follow-up to the #13971 fix: a genuine native Ollama provider reached through a reverse proxy carries no ollama/:11434 URL signature, so the restricted detection would miss it. Add provider=="ollama" as an explicit True case (idea from #14789, @Tranquil-Flow) and cover both it and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.	2026-06-27 04:03:07 -07:00
YuShu	266521b55f	refactor(agent): trim docstring per review feedback Remove commentary about the previous is_local_endpoint() approach from _is_ollama_glm_backend() — git history suffices.	2026-06-27 04:03:07 -07:00
YuShu	00a8252b7d	fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only The _is_ollama_glm_backend() function was too broad: any local endpoint running a GLM model was treated as Ollama, triggering the stop->length misreport heuristic introduced in `8011aa3`. This caused false truncation detection on sglang, vLLM, LM Studio, and other non-Ollama servers that correctly report finish_reason. When a GLM model on sglang/vLLM returned finish_reason='stop', the agent mistakenly reclassified it as 'length' if the response didn't end with a whitelisted punctuation character (ASCII or CJK). This particularly affected Chinese-language responses and Markdown-formatted text. Root cause: the is_local_endpoint() fallback assumed any local GLM endpoint = Ollama. But many non-Ollama servers also run on localhost. Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via its distinctive signatures (port 11434, 'ollama' in URL). All other local servers are assumed to report finish_reason correctly. This is the correct tradeoff because: - False negatives (Ollama at custom port, heuristic not triggered) only mean the user sees a truncated response — same as having no heuristic - False positives (non-Ollama server, heuristic wrongly triggered) inject spurious continuation messages into the conversation — strictly worse Adds two tests: - sglang GLM response is NOT reclassified as truncated - Ollama GLM on port 11434 still triggers the heuristic as before Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-06-27 04:03:07 -07:00
DavidMetcalfe	27c486e3b1	feat(agent): apply per-reasoning-model stale-timeout floor in stream + non-stream detectors Wire get_reasoning_stale_timeout_floor() into both stale detectors so known reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of the upstream gateway idle-killing the socket (BrokenPipeError) before first token. Applied as max(default, floor) — never overrides explicit user config, never lowers an existing threshold. The reasoning_timeouts.py allowlist module already landed on main via #52795, so this salvage carries only the wiring + tests (the duplicate module and the stale-base MoA reverts from the original PR branch are dropped). Salvaged from #52238. Fixes #52217.	2026-06-25 22:12:06 -07:00
Teknium	c6575df927	feat(moa): expose MoA presets as selectable virtual models (#46081 ) * feat(moa): expose MoA presets as selectable virtual models Reconstructed onto current main (PR #46081's base had diverged with no common ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual provider: each named preset is a selectable model under provider 'moa', and the preset's aggregator is the acting model that answers and calls tools. Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same batch pattern delegate_task uses) — all references dispatched at once, collected when every one finishes, then handed to the aggregator. Output order is preserved, failures and the MoA-recursion guard stay isolated per reference. - Removed the old mixture_of_agents model tool and moa toolset. - Added moa as a virtual provider in the provider/model inventory. - /moa is shortcut behavior over model selection (default preset / named preset / one-shot prompt). - Dashboard + Desktop manage named presets; presets appear in model pickers. - Parallel reference fan-out in agent/moa_loop.py with regression test. * fix(moa): thread moa_config through _run_agent to _run_agent_inner The reconstructed gateway MoA wiring declared moa_config on _run_agent (the profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper never forwarded it — _run_agent_inner had no such parameter, so the runtime hit NameError: name 'moa_config' is not defined on the compression-failure session sync path. Add moa_config to _run_agent_inner's signature and forward it from both wrapper call sites (multiplex and non-multiplex). Caught by tests/gateway/test_compression_failure_session_sync.py on CI shard test(4). * fix(moa): classify moa as a virtual provider in the catalog The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so provider_catalog() fell through to the default auth_type="api_key" with no env vars — tripping two catalog invariants: - test_provider_catalog: api_key providers must expose a credential env var - test_provider_parity: every hermes-model provider must be desktop-configurable moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that overlay as an auth_type fallback so the catalog reports moa as virtual (no real credential, no network endpoint). Exempt virtual providers from the desktop parity union check the same way 'custom' is exempt — derived from the catalog, not a hardcoded slug, so future virtual providers are covered too.	2026-06-25 13:52:06 -07:00
Brooklyn Nicholson	2f1a47b90e	feat(agent): require verification before finishing edits Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.	2026-06-24 23:02:48 -05:00
Teknium	7130d60861	feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492 ) * feat(providers): remove google-gemini-cli + google-antigravity OAuth providers Google now actively bans accounts for third-party tools that piggyback on Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention sits at a backend layer the ban can extend to the entire Google account (Gmail/Drive), with a second violation being permanent. Ref: https://github.com/google-gemini/gemini-cli/discussions/20632 Removes both OAuth inference providers entirely (modules, provider profiles, auth/runtime/config/models wiring, the /gquota Code Assist quota command, the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans). The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against generativelanguage.googleapis.com) is unaffected and stays fully supported. * fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed The antigravity-cli optional skill orchestrates the external `agy` binary as a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference through the banned google-antigravity OAuth provider, so it carries none of the account-ban risk that motivated removing that provider. Restore the skill, its docs page, the sidebar entry, and the optional-skills catalog row. The google-antigravity / google-gemini-cli inference providers stay fully removed.	2026-06-21 19:53:27 -07:00
yeyitech	b17180d950	fix(session): finalize owned SQLite session rows on AIAgent.close() Funnel session finalization through AIAgent.close() — the single terminal path every agent (CLI, gateway, subagent, cron) funnels through — so finished agents stop leaving rows with ended_at IS NULL. The biggest leak source was delegate_task subagent + background-review forks whose close() never ended their row. end_session() is first-reason-wins and no-ops on an already-ended row, so a 'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal path is never clobbered. /resume already calls reopen_session(), so finalizing-on-close does not break resumability. Temporary helper agents that rotate/share the session forward (manual compression, gateway session-hygiene) opt out via _end_session_on_close=False. Also stop the long-running gateway heartbeat once the executor is done or the session slot is rebound to a different agent, preventing a stale 'running: delegate_task' bubble from outliving its run. Closes #12029.	2026-06-21 11:35:09 -07:00
konsisumer	3e354b61db	fix(agent): preserve copilot routed headers	2026-06-21 11:29:49 -07:00
Teknium	ea8a8b4af8	feat(delegation): background fan-out — parallel subagents, one consolidated return (#49734 ) * feat(delegation): single-task delegate_task always runs in the background The model no longer decides whether a subagent runs in the background — a single-task delegate_task from the top-level agent is now always dispatched async, so the parent turn returns immediately and the subagent's result re-enters the conversation when it finishes. - run_agent._dispatch_delegate_task (the live model path) forces background=True for top-level single-task calls; the schema-level `background` param is ignored. - A batch (tasks with >1 item) stays synchronous (fan-out can't go async). - A delegation from an orchestrator subagent (depth > 0) stays synchronous — it needs its workers' results within its own turn. - The function-level default is unchanged, so direct Python callers/tests keep the historical synchronous behavior. - On async-pool capacity rejection, single-task now falls through to a synchronous run instead of erroring (the child stays attached for interrupt propagation; detach happens only on a successful dispatch). - Schema `background` param marked deprecated/ignored; tool description updated to state the always-background single-task rule. * feat(delegation): all delegate_task fan-out runs in the background Extend the always-background behavior to the full fan-out. A batch is now dispatched as N independent async subagents (one handle each), instead of running synchronously. Single task and batch both return immediately; each subagent's result re-enters the conversation as its own message when it finishes. - delegate_task: when background is set, loop over ALL built children and dispatch each via dispatch_async_delegation; return a combined handle block (count + per-task delegation_ids). Children the async pool rejects (at capacity) run synchronously inline and are reported alongside the dispatched handles, so nothing is silently dropped. - run_agent._dispatch_delegate_task + registry handler: force background for any top-level model delegation (single OR batch); orchestrator subagents (depth > 0) still run synchronously since they need workers' results within their own turn. - Removed the v1 'batch async not supported' rejection. - Tool description updated: BOTH MODES RUN IN THE BACKGROUND. - Tests updated to assert batch fan-out dispatches each task async (verified E2E: 3-task batch -> 3 independent completion-queue events). * fix(delegation): background fan-out joins and returns one consolidated block Correct the fan-out semantics: a backgrounded batch is dispatched as ONE async unit (one handle, one async-pool slot), not N independent dispatches. The unit runs all children in parallel, waits on every one, and emits a SINGLE completion event carrying the consolidated per-task results. The chat is never blocked; when all subagents finish, their full summaries re-enter the conversation together as one message. - async_delegation.dispatch_async_delegation_batch + _finalize_batch: a batch occupies one slot; its runner returns the combined {results:[...]} dict and one event with the full results list is pushed to the completion queue. - delegate_tool: extract the sync execution+aggregation into _execute_and_aggregate(); background dispatches it via the batch unit and returns one handle; on pool-capacity rejection it runs the batch inline. - process_registry._format_async_delegation: render a consolidated multi-task block (TASK i/N + per-task summary) when the event carries is_batch/results. - Tests updated; E2E verified: 3-task batch -> immediate return -> one combined completion block with all three summaries.	2026-06-20 11:27:12 -07:00
kshitijk4poor	a7dd98c860	fix(env): guard remaining malformed int/float env var casts with utils helpers Widen the env_float() guard from #48735 across the whole bug class: a non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd port) raised an unhandled ValueError and crashed adapter/agent init. Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to the canonical utils.env_int / utils.env_float helpers (the established house pattern), instead of duplicating per-module helpers or inline try/except: - gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT - gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL - gateway/platforms/feishu.py: dedup cache + text/media batch settings - gateway/platforms/wecom.py, discord/adapter.py: text batch delays - gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT - gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT - hermes_cli/auth.py: CODEX/XAI refresh timeouts - agent/chat_completion_helpers.py: API/stream read/stale timeouts - run_agent.py, agent/auxiliary_client.py: API + nous timeouts Sites already guarded by try/except or local helpers are left untouched. The HERMES_MAX_ITERATIONS sites are already guarded on main via _current_max_iterations(), so they are not included.	2026-06-20 14:54:36 +05:30
Gille	013f9c8750	fix(memory): log CLI shutdown hook failures Makes the CLI memory-provider shutdown path observable: log when CLI cleanup calls memory shutdown (with session id + message count), warn instead of swallowing CLI memory-shutdown exceptions, warn on on_session_end failures during agent shutdown, and raise the MemoryManager provider-hook failure log from debug to warning with a traceback. Salvaged from PR #49287 (authored by Gille / @helix4u).	2026-06-19 16:59:43 -07:00
KeyArgo	1e40b21b2e	docs: clean up three stale comments from the #32848 audit (#45638 ) * docs: clean up three stale comments from the #32848 audit - tools/memory_tool.py:20 — 'read' action was intentionally removed but the docstring still listed it. Now matches the schema. - tools/fuzzy_match.py:9 — unicode_normalized was added but the chain-count docstring still said '8-strategy'. Now says '9'. - run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in. Replaced with a backfill note. Fixes #32848 (parts 3, 4, and 12) * docs(memory): also remove stray memory(action=read) references in lines 144 and 201 The original #32848 audit fix (in 6fd661d6) only addressed line 20 (the action list in the module docstring), but the action was referenced in two other places: - tools/memory_tool.py:144 — in a class docstring, claimed 'memory(action=read)' was a way to SEE poisoned entries - tools/memory_tool.py:201 — in a user-facing warning message, told the user to 'use memory(action=read) to inspect' Since the schema on line 683 only allows add/replace/remove, both references were misleading: the first claimed a way to inspect poisoned entries that doesn't exist, the second would error out when the user followed the warning. This commit removes both references: - Line 144: '...keep the original text so the user can still SEE poisoned entries by inspecting the source files directly, and remove them — silently dropping them would hide the attack from the user.' - Line 201: '...use memory(action=remove) to delete the original. (drop the read-action reference)' Followup to the previous commit on this branch. --------- Co-authored-by: KeyArgo <keyargo@argobox.com>	2026-06-19 16:09:30 -07:00
Gille	a7983d5ad7	fix(dashboard): hide sidecar sessions from history (#49269 ) * fix(dashboard): hide sidecar sessions from history * test(dashboard): allow sidecar source in session payload	2026-06-19 18:06:38 -04:00
tt-a1i	46f9d53468	fix(agent): aggregate anthropic aux calls via stream	2026-06-19 17:32:13 +05:30
Gille	e4452ffb8a	fix(agent): summarize structured provider error messages	2026-06-18 21:37:52 -07:00
Reiji Kisaragi	3d21666b2f	fix: preserve multimodal user content during persistence Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242	2026-06-17 09:49:39 -07:00
Wolfram Ravenwolf	bd7fc8fdcd	feat(gateway): inject stable human-readable message timestamps Consolidates these related Amy fork patches: - 429830f39 feat(gateway): inject message timestamps into user messages for LLM context - 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay - 2874f7725 feat: human-friendly timestamp format with weekday and timezone name - 3735f4c8b fix: render gateway message timestamps once	2026-06-16 15:49:59 -07:00
teknium	28f92478e3	test(hooks): cover session:compress event; drop dead import Follow-up to salvaged PR #41624: - Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused) - Add tests: session:compress emits with correct context, no-callback is safe, and a callback exception does not break compression	2026-06-16 11:45:36 -07:00
Wolfram Ravenwolf	e76e7b5073	feat(hooks): session:compress event_callback for MemPalace sync	2026-06-16 11:45:36 -07:00
Wolfram Ravenwolf	4cf9d80fba	feat(display): verbose skill change notifications with content previews When display.memory_notifications is set to 'verbose', skill_manage notifications now show meaningful change details instead of just the generic tool message. Before (verbose mode): 💾 📝 Patched SKILL.md in skill 'gogcli' (1 replacement). After (verbose mode): 💾 📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..." Changes: - skill_manager_tool.py: _patch_skill() now includes old/new string previews (truncated to 200 chars) in the result via '_change' key. _create_skill() and _edit_skill() include skill description from frontmatter for verbose create/edit notifications. - run_agent.py: Background review notification builder now reads the '_change' dict from skill tool results and formats descriptive notifications per action type (patch → old→new diff, create/edit → description preview). Falls back to generic message when _change data is unavailable (backwards compatible). This is especially useful when subagents patch skills, since neither the user nor the parent agent can see what the subagent changed.	2026-06-16 05:45:40 -07:00
Teknium	0a8f3e21b8	fix(delegation): forward background flag so delegate_task(background=true) runs async (#46968 ) * fix(skills): guard recursive skill delete against tree-escape Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire working directory: a built-in-skill sentinel location resolved to the server cwd and the skill-removal endpoint ran a recursive delete on it. Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the agent-facing skill_manage(action='delete') path did a bare shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target(): refuse to rmtree a path that (1) isn't strictly inside a known skills root, (2) is a skills root itself, or (3) is reached via a symlink/junction. Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree all refused). E2E verified with real symlink + file I/O. * fix(delegation): forward background flag in delegate_task dispatch delegate_task is an _AGENT_LOOP_TOOLS member, so every surface (CLI, gateway, desktop/TUI) routes it through AIAgent._dispatch_delegate_task. That forwarder passed every schema field except background, so delegate_task(background=true) was silently downgraded to a synchronous run and returned the sync results payload instead of a delegation_id. The model sees background in the schema (the call validates), but the value never reached the function. Add the one missing kwarg so async background delegation actually engages.	2026-06-15 18:52:02 -07:00
Teknium	49e743985a	fix: route minimax m3 reasoning controls through profile Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes. Co-authored-by: goku94123 <gooku94123@gmail.com>	2026-06-15 07:08:43 -07:00

1 2 3 4 5 ...

1057 commits