Commit graph

6997 commits

Author SHA1 Message Date
liuhao1024
d3c8a155cb fix(slack): keep blank-line-separated ordered items in one rich_text_list
When a Markdown ordered list has blank lines between items (common in
LLM-authored content), the list run loop breaks on each blank line.
Slack numbers each rich_text_list independently, so N items produce N
lists each starting at 1.

Skip blank lines inside the list run as soft separators instead of
breaking, so ordered items stay in one rich_text_list and Slack renders
the correct numbering.

Fixes #57076
2026-07-03 02:55:22 +05:30
Yingliang Zhang
67472fbaa4 fix(tui_gateway): route setup.runtime_check and setup.status to RPC pool
setup.runtime_check and setup.status are polled by the Desktop frontend on
connect and periodically (use-status-snapshot → evaluateRuntimeReadiness), but
neither was in _LONG_HANDLERS — so dispatch() ran both inline on the WS reader
thread. Under GIL pressure from concurrent agent turns (terminal I/O, large
output, background-process completions) either can block for seconds:

- setup.runtime_check → resolve_runtime_provider() (config read, auth check,
  may probe the provider endpoint)
- setup.status → _has_any_provider_configured() (provider config + credential
  scan)

While either blocks the reader thread the WS read loop can't service later
requests; the frontend RPC timeout fires, the client drops the socket, and the
lost setup.runtime_check response reads as ready=false — a false "needs setup"
/ "Settings failed to load" even though the provider is configured.

Route both to the RPC pool (same precedent as #55545's session.list/pet.info/
process.list). The handlers are read-only and pool writes go through the
lock-guarded write_json, so there's no ordering or safety concern.

Test asserts all 5 frontend-polled RPCs are pool-routed.

Co-authored-by: izumi0uu <izumi0uu@gmail.com>
2026-07-02 15:44:37 -05:00
Brooklyn Nicholson
1501a338c3 fix(cli): stop profile-bound backends before deleting so rmtree converges
delete_profile stopped only the process named in gateway.pid, but a Desktop
app spawns a headless `serve`/`dashboard` backend per profile that holds the
profile's SQLite connection open and keeps writing sessions/WAL/sandbox files.
That backend is never in gateway.pid, so a CLI `hermes profile delete` run
while the Desktop app is up left it writing into the tree — rmtree's final
rmdir then failed with ENOTEMPTY (#47368 "Bug 2"), and pre-guard it also
resurrected the directory.

- _profile_bound_backend_pids(): find running Hermes backends bound to this
  profile via a `--profile <name>` selector or a HERMES_HOME env resolving to
  the profile dir. Tightly scoped — current-user only, backend subcommands
  (serve/dashboard/gateway) only so an interactive chat is never killed, and
  never this process or its ancestors.
- _stop_profile_backends(): terminate them (graceful, then force), best-effort
  so it can never make delete worse.
- _rmtree_with_retry(): a few spaced retries absorb the ENOTEMPTY / Windows
  file-lock race from a just-terminated writer's in-flight -wal/-shm/sandbox
  writes instead of failing the whole delete on a race the next attempt wins.

Complements the recreation guard (deleted profiles no longer reappear) and the
Desktop teardown-before-delete flow; this is the CLI-side convergence fix for a
delete run while a Desktop-managed backend is live.

Part of #47368.
2026-07-02 15:31:35 -05:00
Brooklyn Nicholson
5a6720b884 fix(desktop,tui-gateway,zai): stop thinking-off from reverting to medium
A Z.ai desktop user reported thinking reverting to medium after one turn,
burning ~200% of a week's credits in 4 days despite reasoning_effort: false
in config.yaml. Four compounding bugs:

- _session_info reported reasoning_effort "" for disabled reasoning,
  indistinguishable from unset — the desktop adopted it after the first
  turn, wiping its sticky "thinking off" pick so every later chat
  reverted to the default effort.
- config.set key=reasoning always wrote agent.reasoning_effort to global
  config.yaml, so every desktop model-menu selection (preset.effort ??
  'medium') clobbered the user's configured value. Now session-scoped
  like the messaging gateway's /reasoning, landing on
  create_reasoning_override so lazily-built sessions keep it too.
- YAML `reasoning_effort: false`/`off`/`no` (boolean False) was coerced
  to "" by every loader's `str(x or "")`, silently re-enabling thinking.
  parse_reasoning_effort now treats False/"false"/"disabled" as
  {"enabled": False}; loaders (tui gateway, gateway, cli, cron,
  delegate) pass the raw value through. The desktop config reader also
  crashed on the boolean (false.trim()), aborting voice/STT settings.
- The zai provider profile never sent thinking on the wire, and GLM-4.5+
  defaults to thinking ON server-side — so disabling reasoning was a
  silent no-op on direct Z.ai, the actual token burner. The profile now
  emits extra_body.thinking {"type": "enabled"|"disabled"} for
  thinking-capable GLM models, mirroring the DeepSeek profile.

Also: /new (session reset) now carries reasoning_config across the
rebuild like model_override; config.get reasoning prefers the session's
live value and maps a config False to "none"; Settings shows "Off"
instead of a blank select for hand-written false.
2026-07-02 15:23:47 -05:00
teknium1
254328bf56 fix(auth): remove stale loopback_pkce reference in xAI quarantine removal list
The terminal-refresh quarantine filtered in-memory entries on
source == "device_code" but built removed_ids from the deleted
"loopback_pkce" source name, so the revoked device-code entry was
never pruned from the persisted pool in auth.json. Also restores the
_print_loopback_ssh_hint test suite scoped to Spotify (the helper's
remaining caller) instead of deleting it wholesale.
2026-07-02 13:17:41 -07:00
Jaaneek
5ef0b8acb0 feat(auth): make xAI Grok OAuth device-code-only, drop loopback login
Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.

- Poll the token endpoint with server-provided interval, honoring
  slow_down and expires_in; store tokens with auth_mode
  oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
  rotated tokens sync back to auth.json, the global root store, and the
  credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
  drop the duplicate dashboard pool entry so exactly one seeded
  device_code entry exists.
- Use the shared device_code source name for consistency with the
  nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
  pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
  --manual-paste flag from documented commands.
2026-07-02 13:17:41 -07:00
LeonSGP43
472d75193f Prevent deleted profile skeleton revival 2026-07-02 15:11:56 -05:00
teknium1
a2d49de801 fix(terminal): also set MSYS2_ARG_CONV_EXCL for MSYS2/Cygwin bash fallback
MSYS_NO_PATHCONV is honored by Git for Windows bash only. _find_bash's
final shutil.which fallback can return MSYS2-proper or Cygwin bash,
which ignore it and honor MSYS2_ARG_CONV_EXCL instead. Set both so argv
path conversion stays disabled regardless of which bash flavor spawns.
Also subsumes the cmd /c mangling in #56147.
2026-07-02 11:48:03 -07:00
xxxigm
51c01062d4 test(terminal): cover MSYS_NO_PATHCONV defaults on Windows env builders 2026-07-02 11:48:03 -07:00
David Zhang
30e947e0a0 feat(gateway): persist per-session /model overrides across gateway restarts
Per-session /model overrides (_session_model_overrides) were in-memory only,
so a gateway restart silently reverted every session to the global default
model. Persist the non-secret parts (model/provider/base_url ONLY — never
api_key) into the session entry in sessions.json and lazily rehydrate them
on first use after a restart, re-resolving credentials through the normal
runtime provider resolution.

- gateway/session.py: SessionEntry.model_override field with
  sanitize_model_override() (allowlist: model/provider/base_url) applied on
  both serialization and deserialization; SessionStore.set_model_override /
  get_model_override accessors. reset_session() already creates a fresh entry,
  so /new keeps its clear-on-reset semantics — a restart cannot resurrect an
  override the user reset away.
- gateway/slash_commands.py: write-through at both /model set sites (text
  command + picker) after storing the in-memory override.
- gateway/run.py: _rehydrate_session_model_override() called from
  _resolve_session_agent_runtime(); in-memory state always wins, credentials
  are re-resolved per provider (credential-less fallback on failure). Session
  expiry finalization also drops the persisted override.
- tests/gateway/test_session_model_override_persistence.py: restart
  round-trip, /new clearing, api_key-never-serialized (including tampered
  sessions.json), rehydration + live-state precedence + credential-failure
  degradation.

Salvaged from #3659 by @Git-on-my-level, narrowed to the restart-persistence
gap confirmed in triage.
2026-07-02 05:51:12 -07:00
Jneeee
b98baa3039 feat(config): extra HTTP headers for LLM API calls (#3526 salvage)
Named providers / custom_providers entries in config.yaml now accept an
extra_headers dict scoped to that endpoint — for reverse proxies, API
gateways, and custom auth schemes (e.g. Cloudflare Access service tokens).

- hermes_cli/config.py: normalize extra_headers on provider entries
  (_normalize_custom_provider_entry + providers-dict translation), add
  get_custom_provider_extra_headers /
  apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on
  base_url (case/trailing-slash insensitive, no substring bypass —
  mirrors the TLS helpers)
- hermes_cli/runtime_provider.py: surface extra_headers in the resolved
  runtime for named custom providers (providers dict, legacy
  custom_providers list, and the credential-pool path)
- run_agent.py / agent/agent_init.py: merge per-provider extra_headers
  onto the OpenAI client default_headers at construction and on every
  _apply_client_headers_for_base_url re-application (credential swaps,
  rebuilds), most-specific level wins; OpenAI-wire only (native
  Anthropic/Bedrock scoped out)
- agent/auxiliary_client.py: accept model.extra_headers as an alias of
  model.default_headers for the global variant
- cli-config.yaml.example: documented commented example
- Header values are treated as secrets and never logged

Salvaged from PR #3526 by @jneeee, reimplemented against current main.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-07-02 05:33:25 -07:00
Mibayy
4a09b692ec feat(api-server): per-client model routing via model_routes (#3176 salvage)
Adds a no-code routing layer to the OpenAI-compatible API server so one
Hermes deployment can map different API clients to different
model/provider backends. Clients pick a backend by sending a configured
alias as the OpenAI 'model' field; unmatched values fall back to the
global model. Configured aliases are listed by GET /v1/models.

Precedence (highest first): session /model override > model_routes
route > global config. Route provider credentials resolve through
_resolve_runtime_agent_kwargs_for_provider (same seam as
channel_overrides); per-route api_key/base_url are upstream provider
credential overrides — never caller auth, never logged.

Salvaged and rebased from PR #3176 by @Mibayy onto current main.
2026-07-02 05:23:28 -07:00
Mibayy
ce9aa869fc feat(commands): /compact alias + --preview/--dry-run flags for /compress (#3243 salvage)
Salvaged from PR #3243 by @Mibayy, reimplemented against current main
(the original diff targeted a removed gateway/run.py handler).

- /compact is now a first-class alias of /compress (CLI, gateway,
  Telegram/Slack/Discord command lists, autocomplete) — also fixes the
  dangling '/compact' references in gateway error messages
  (gateway/run.py context-exhausted banners).
- --preview / --dry-run: report what WOULD be compressed (message
  counts, token estimate, 'here [N]' boundary) without touching the
  transcript. Flags coexist with the existing 'here [N]' / focus-topic
  args on both the CLI and gateway surfaces via shared pure helpers in
  hermes_cli/partial_compress.py.
- --aggressive (LLM-free hard truncation) is intentionally NOT
  implemented: it would need its own transcript-persistence branch
  outside the guarded _compress_context rotation machinery (#44794
  data-loss class). The flag is recognized and returns an explanatory
  message pointing at '/compress here [N]' and /undo instead of being
  mis-parsed as a focus topic.
- locales: gateway.compress.aggressive_unsupported added to all 16
  catalogs (parity test enforced).
- release.py: AUTHOR_MAP entry for contributor credit.
2026-07-02 05:10:31 -07:00
Morgan K
39bff67957 feat(gateway): add 'log' option to display.tool_progress
Salvage of #3459 by @keslerm, reimplemented against the restructured
progress-callback block in gateway/run.py (resolve_display_setting,
needs_progress_queue, thinking-relay). Duplicate PR #3458 by @dlkakbs was
submitted 4 minutes earlier with the same feature — both credited.

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>

tool_progress: log keeps the chat silent and appends timestamped tool-call
lines to ~/.hermes/logs/tool_calls.log via a dedicated queue drained by an
async writer (RotatingFileHandler 5MB x 3, RedactingFormatter so secrets
never land on disk). Gateway-only by design; thinking_progress relaying and
the webhook gate are unaffected. /verbose now cycles
off -> new -> all -> verbose -> log.
2026-07-02 05:09:38 -07:00
Mibayy
070ac2a719 fix(status): label provider as custom when config.yaml model.base_url is set
Salvage of the surviving hunk of #3296 by @Mibayy. The PR's gateway
_handle_provider_command hunk targets code removed on main (/provider was
absorbed into /model + /status, which already read model.base_url); the
hermes status mislabel was the remaining live symptom:
_effective_provider_label() only checked the legacy OPENAI_BASE_URL env var,
so a custom endpoint configured canonically in config.yaml still displayed
as OpenRouter.
2026-07-02 04:59:02 -07:00
kshitijk4poor
019950560d refactor(image-gen): reuse shared image sniffer + raster allowlist in codex backend
Replace the plugin-local _IMAGE_MAGIC_MIME table + _sniff_image_mime
body with a delegation to agent.image_routing._sniff_mime_from_bytes,
the canonical magic-byte sniffer already used across the codebase, then
gate its result to the raster formats gpt-image-2's Responses
input_image actually accepts (png/jpeg/gif/webp).

The shared sniffer also recognizes SVG/TIFF/ICO; without the allowlist
those would pass local validation and be rejected server-side with an
opaque HTTP 400. Gating locally fails them cleanly as invalid_image_input.
Adds a regression test for SVG rejection.

Follow-up on top of @CrazyBoyM's #55828.
2026-07-02 17:12:24 +05:30
CrazyBoyM
460235d584 test(image-gen): cap Codex reference inputs 2026-07-02 17:12:24 +05:30
CrazyBoyM
ecffd290a3 feat(image-gen): support Codex image inputs 2026-07-02 17:12:24 +05:30
Evo
a4a562ff0c fix(browser): guard Camofox snapshot/vision/images on private pages
Follow-up to #56874, which added the Camofox private-page SSRF guard
(_camofox_current_page_private_url) but wired it only into the Camofox
eval path (_camofox_eval). The other Camofox content-read tools —
camofox_snapshot, camofox_get_images, and camofox_vision — still read the
current page's accessibility tree / images / screenshot without the
guard, so on a non-local Camofox backend they can return the content of
an intranet or cloud-metadata page (e.g. 169.254.169.254) that the
terminal itself can't reach.

Apply the same guard, gated on _eval_ssrf_guard_active (non-local
backend, not a local sidecar, allow_private_urls unset) and fail-open on
probe failure, matching the eval-path guard and the main-browser
snapshot/vision guards. camofox_back is intentionally not changed: its
target is unknown until navigation completes, and the subsequent content
read is already guarded.

Adds regression tests covering the three read tools blocking on a private
page, the public-page pass-through, and the guard-inactive no-probe path.
2026-07-02 17:07:17 +05:30
HexLab98
ede4d12561 test(codex): cover gateway-scale stale timeout floor and TTFB gate 2026-07-02 17:05:05 +05:30
Teknium
3f2a56d1a4
fix(cli): reliable interrupts, bounded exit, and exit feedback (#57000)
Three CLI reliability fixes:

1. Interrupt reliability: chat() only re-queued the user's interrupt
   message when the turn result carried interrupted=True. When the agent
   thread raced past its last interrupt check (or finished) before the
   interrupt landed, the message was silently dropped — and the stale
   _interrupt_requested flag left on the agent instantly aborted the
   NEXT turn. Un-acknowledged interrupt messages are now re-queued as
   the next turn and the stale flag is cleared (only when the agent
   thread actually exited). The clarify-race path also parks the message
   in _pending_input instead of dropping it.

2. Slow exit (5+ min): stdlib ThreadPoolExecutor workers are non-daemon
   and joined unconditionally by concurrent.futures' atexit hook — even
   after shutdown(wait=False). One wedged tool worker (abandoned after
   interrupt/timeout) held the process open forever. Promoted
   async_delegation's daemon executor to a shared tools/daemon_pool
   module and adopted it in tool_executor (concurrent tool batches),
   memory_manager (background sync), delegate_tool (child timeout wrapper
   + batch fan-out), and skills_hub (source fan-out). Added a 30s exit
   watchdog (HERMES_EXIT_WATCHDOG_S) armed at _run_cleanup start as a
   backstop for wedged cleanup steps.

3. Exit jank: after prompt_toolkit tears down the input/status bars the
   terminal sat silent for the whole cleanup window, looking hung. Print
   'Shutting down… (finalizing session)' immediately at exit start.

E2E: live PTY interrupt of a foreground 'sleep 120' terminal tool now
aborts in ~1s and the typed message runs as the next turn; wedged-worker
+ wedged-cleanup subprocess exits in 5.8s (watchdog) instead of hanging.
2026-07-02 04:20:43 -07:00
Tarun Ravikumar
2068754d6f feat(api-server): inline MEDIA: image tags as base64 data URLs for remote frontends
Salvage of the surviving piece of #2696 by @tarunravi. The PR's other two
changes (tool progress streaming, SSE None-sentinel fix) were independently
superseded on main by the structured hermes.tool.progress SSE events and the
rewritten queue-drain loop.

Remote OpenAI-compatible frontends can't read server-local file paths, so
MEDIA:<path> tags (browser screenshots, generated images) were dead text.
_resolve_media_to_data_urls() now inlines small (<=5MB) local images as
markdown data URLs across all four response surfaces: chat completions
(non-streaming), session chat, session chat stream final event, and the
Responses API. Non-image, missing, or oversized paths pass through
untouched.
2026-07-02 03:23:44 -07:00
CharmingGroot
88bd1c01e1 fix(email): harden adapter against malformed IMAP responses
Salvage of #2794 by @CharmingGroot, ported to the relocated
plugins/platforms/email/adapter.py:

- Guard raw_email = msg_data[0][1] against IndexError/TypeError and
  non-bytes payloads. UIDs are added to _seen_uids before fetch, so an
  exception mid-batch permanently skipped every remaining message in
  the batch — now the bad message is logged and skipped instead.
- Message-ID domain generation falls back to 'localhost' when
  EMAIL_ADDRESS lacks '@' (now via a shared _message_id_domain() helper
  covering all 3 send paths; the PR fixed 2 of 3).
2026-07-02 03:12:53 -07:00
crazywriter1
0010c14e66 feat(gateway): per-channel model and system prompt overrides (Fixes #1955)
- ChannelOverride + channel_overrides on PlatformConfig
- Resolve model/runtime: session /model, then channel_overrides, then global
- Thread/parent channel lookup; bridge discord.channel_overrides from YAML
- Drop unrelated test and delegate_tool changes from PR scope
2026-07-02 03:08:11 -07:00
crazywriter1
ebef73f6b8 feat(gateway): per-channel model and system prompt overrides (Fixes #1955)
- config: ChannelOverride + PlatformConfig.channel_overrides

- run: _resolve_model_for_channel, _get_system_prompt_for_channel, channel provider runtime

- tests: channel overrides + config guard for bare runner; conftest asyncio fix; slack/whatsapp warning filters

Made-with: Cursor
2026-07-02 03:08:11 -07:00
Teknium
902b0b70e4 test: env-flag 'on' truthy behavior contract (#2863 follow-up) 2026-07-02 03:00:59 -07:00
VolodymyrBg
ea5d75befd fix(webhook): remove unused payload from delivery state 2026-07-02 03:00:17 -07:00
Teknium
6e369a3762
feat(delegation): unify concurrency caps — deprecate max_async_children (#56955)
delegation.max_concurrent_children is now the single cap for both a
batch's parallelism and concurrent background delegation units.

- _get_max_async_children() delegates to _get_max_concurrent_children();
  a leftover max_async_children key logs a one-time deprecation warning
- config v32→33 migration removes the stale key, folding a raised
  max_async_children into max_concurrent_children (max wins, no lost
  headroom)
- capacity error messages now point at max_concurrent_children
- pool-at-capacity sync fallback now attaches an explanatory note so
  the model/user know why the call blocked instead of dispatching async

Previously users who raised max_concurrent_children (e.g. to 15) still
hit the invisible default-3 async cap: the 4th background delegate_task
silently ran inline, blocking the turn with no signal.
2026-07-02 02:53:39 -07:00
Teknium
14639ded77
fix(terminal): stop stripping CLAUDE_CODE_OAUTH_TOKEN from spawned subprocesses (#56935)
CLAUDE_CODE_OAUTH_TOKEN is set and owned by the user's Claude Code
install (subscription OAuth), not a Hermes-managed inference
credential — Claude subscription auth is not a working Hermes provider
path. Blocklisting it broke agent-spawned claude CLIs: with no token in
the child env, claude fell through to the shared macOS Keychain /
~/.claude/.credentials.json store and, on auth failure, cleared it —
logging the user out of their interactive Claude sessions and the
desktop app.

Exempt it from _HERMES_PROVIDER_ENV_BLOCKLIST (it arrives via the
anthropic registry entry, so discard explicitly with rationale).
ANTHROPIC_API_KEY / ANTHROPIC_TOKEN and every other provider credential
remain stripped, and the GHSA-rhgp-j443-p4rf fail-closed passthrough
guard is unchanged for everything still on the blocklist.

Fixes #55878
2026-07-02 02:13:30 -07:00
kshitijk4poor
b837f07dcd fix(agent): route restore custom-pool match through canonical helper
Follow-up on the salvaged #56392 guard. The cherry-picked change matched
custom:<name> pool entries against the primary by raw base_url string
equality, which (a) can't disambiguate two named custom providers sharing
one gateway base_url and (b) left a latent bare-"custom" entry bypass.

Route the match through get_custom_provider_pool_key(rt[base_url]) compared
against the entry's custom:<name> key, mirroring the sibling guard in
recover_with_credential_pool. Use CUSTOM_POOL_PREFIX instead of the literal.

Add regression tests for the custom same-endpoint (swap) and cross-endpoint
(skip) branches, plus the plain-provider fallback-pool case from #56885.
2026-07-02 13:41:53 +05:30
openhands
820a052575 fix(agent): keep primary runtime restore on matching credential pool (#56374) 2026-07-02 13:41:53 +05:30
Teknium
fb403a3a73
fix(auxiliary): retry transient blips harder + isolate client cache per model (#56889)
Two related hardening fixes for auxiliary calls (which include MoA reference
advisors — a pinned-model path where provider fallback is not a meaningful
recovery):

1. Transient-transport retries: the same-provider retry on a connection reset /
   timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux
   call a second blip silently loses the call (root of the run2 double-advisor
   'Connection error' collapse — a genuine upstream blip). Now retries N times
   with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3
   total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out
   preserved.

2. Per-model client-cache isolation: _client_cache_key excluded the model, so
   two concurrent auxiliary calls to the same provider/base_url/key but
   different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry
   and could race each other's client lifecycle. Model now participates in the
   key -> distinct clients, no cross-call races. Same-model reuse unchanged.

- agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in
  _client_cache_key and both call sites.
- hermes_cli/config.py: auxiliary.transient_retries default (2).
- tests: new retry/isolation tests; updated 2 stale-expectation tests to the
  corrected behavior (per-model resolve; N-retry escalation).

Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.
2026-07-02 01:09:37 -07:00
Nick Mason
80733413f9 fix(tools): don't drop a toolset from platform inference when a tool is registered into it
_get_platform_tools reverse-maps a platform composite to configurable
toolsets with an all-tools subset test. Because get_toolset() merges
registry-registered tools into a toolset, a tool added to a toolset
(delegate_cli -> delegation; desktop-only read_terminal -> terminal) that the
static composite never listed made the subset test fail, silently dropping the
entire toolset on api_server and other inference-based platforms. Compare the
toolset's static membership at all three reverse-map sites.

Fixes #49622.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 13:25:25 +05:30
Nick Mason
5317993a6d fix(tools): expose static (pre-registry-merge) toolset view for platform inference
Adds include_registry=True kwarg to resolve_toolset/get_toolset. When False,
returns only the static TOOLSETS view with no registry-merged tools — the
composite-authored membership platform reverse-mapping must compare against.
Default True preserves all existing behavior; this is the enabling half of
the api_server toolset-drop fix (#49622).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 13:25:25 +05:30
Ray
6a58badfdc fix(browser): guard Camofox eval private pages
Extends the browser private-network eval guard to the Camofox backend.
On main, _browser_eval() returned early in Camofox mode before running the
shared private-URL literal pre-scan and before re-checking the page URL
after eval, leaving Camofox as a sibling backend that could execute
browser_console(expression=...) against private/internal targets.

- move the eval private-URL literal pre-scan before the Camofox early return
- add a Camofox current-page private-URL probe via the evaluate endpoint
- withhold Camofox eval results when the page is now private/internal

Follow-up to browser private-network hardening in #56173, #56526, #56664.

Salvage of #56764 by @rayjun (rayoo), cherry-picked to preserve authorship.
2026-07-02 13:10:30 +05:30
kshitijk4poor
f2b8a5d541 test(gateway): assert _record_gateway_session_peer fires only on the persisted split
The fake _SessionStore tracked peer_records but no test read it, leaving
#55300's peer-record behavior unasserted. Add a positive assertion on the
persist path and negative (== []) assertions on the two stale/moved-binding
skip paths, so the peer-record side effect is bound.

Mutation-verified: removing the production _record_gateway_session_peer call
makes the positive assertion fail.

Co-authored-by: João Vitor Cunha <jvsantos.cunha@gmail.com>
2026-07-02 12:49:42 +05:30
kshitijk4poor
ed6f80a20c test(gateway): align fake SessionStore with _record_gateway_session_peer
The #55300 peer-recording call now fires on the failed-turn compression
split path; the fake _SessionStore in test_compression_failure_session_sync
(carried in with #55721's test changes) lacked that method. Add a
call-tracking no-op so the combined salvage's tests pass.

Co-authored-by: João Vitor Cunha <jvsantos.cunha@gmail.com>
2026-07-02 12:49:42 +05:30
r266-tech
2a04137322 fix(gateway): preserve platform + gateway_session_key on /compress temp agent
Manual /compress built a temporary AIAgent without the originating
platform / stable gateway session key, so an external context engine
ingested the retained transcript tail as source=cli during /compress
and again as the real platform on resume (duplicate cli,telegram rows).
Pass platform=_platform_config_key(source.platform) + the in-scope
gateway_session_key, mirroring the normal gateway turn. Assigned into
runtime_kwargs (single-valued, authoritative) so they neither collide
into a duplicate-kwarg TypeError nor lose to a stale resolver value.

Fixes #50422.
2026-07-02 12:49:42 +05:30
Jake Present
00ec3b1884 fix(gateway): ignore stale compression session splits 2026-07-02 12:49:42 +05:30
João Vitor Cunha
d5b4879d4a fix(gateway): preserve peer routing across compression recovery 2026-07-02 12:49:42 +05:30
Teknium
543d305bbb
feat(moa): add reference_max_tokens to cap advisor output and cut turn latency (#56756)
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.

Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).

E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.

- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
  in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
2026-07-02 00:16:35 -07:00
Ben Barclay
9be39de0f2
fix(auth): make HERMES_PORTAL_BASE_URL/NOUS_PORTAL_BASE_URL bypass the Portal host allowlist (#56864)
Ben caught that the initial approach (widening _NOUS_PORTAL_ALLOWED_HOSTS to
include the staging host) was the wrong fix -- env vars are supposed to
override the allowlist, mirroring how NOUS_INFERENCE_BASE_URL already
bypasses _ALLOWED_NOUS_INFERENCE_HOSTS via _nous_inference_env_override().

The actual bug: both resolve_nous_access_token and
resolve_nous_runtime_credentials read
`_optional_base_url(state.get("portal_base_url")) or os.getenv(...) or ...`
-- a plain `or` chain where the STORED state value wins first (short-circuits
before the env vars are even read), and then whichever value won gets run
through the same _NOUS_PORTAL_ALLOWED_HOSTS gate regardless of its source.
So a hosted agent stamped with HERMES_PORTAL_BASE_URL=<staging> in its env
AND a staging portal_base_url already persisted to auth.json would still
get silently rewritten to prod on every refresh, because the env var never
even got a chance to be consulted.

Revert the previous _NOUS_PORTAL_ALLOWED_HOSTS widening entirely --
staying prod-only preserves the allowlist's actual job (rejecting an
untrusted network-provided portal_base_url persisted to auth.json by a
compromised Portal response).

Add _nous_portal_env_override() (mirrors _nous_inference_env_override())
and restructure both call sites so the env override is checked FIRST and,
when set, wins outright and skips the allowlist gate entirely -- the
allowlist only ever runs against the fallback (stored-state-or-default)
path now.

Rewrote tests/hermes_cli/test_nous_portal_staging_allowlist.py to test the
actual fix: the helper function, and an end-to-end
resolve_nous_access_token proof that the env override wins even when state
ALSO has the staging host stored (the exact incident shape), that it wins
over a stored PROD host too, and that the allowlist's heal-to-prod
behaviour for an untrusted stored value is preserved when no override is
set.
2026-07-02 06:52:46 +00:00
kshitij
88d1d6206f
fix(streaming): handle completed responses with empty/None choices (#55933) (#56713)
* fix(streaming): handle completed responses with empty/None choices

The streaming fallback guard added in #55932 recognized a completed
response object only when its `choices` was a non-empty list. But an
adapter can return a completed response whose `choices` is `None` or an
empty list (an error / content-filter / terminal frame) — still a whole,
non-iterable response, not a token stream. Those shapes fell through to
`for chunk in stream` and crashed with

    'types.SimpleNamespace' object is not iterable

which is exactly issue #55933 (MoA `openai-codex` aggregator on
TUI/Desktop, where a stream consumer forces the streaming path).

Broaden the guard to discriminate on the PRESENCE of a `choices`
attribute (a genuine provider Stream object exposes none), disable
streaming for the session, and return the completed object so the outer
loop's normal invalid-response validation handles empty/None choices via
its retry path instead of iterating.

Based on the diagnosis in #56525 by @spiky02plateau (that PR normalized
the MoA aggregator return with a one-shot chunk iterator; the common
text/tool-call crash was already fixed at this seam by #55932, so this
extends the existing guard to cover only the remaining empty/None-choices
gap).

Fixes #55933

* refactor(streaming): simplify empty-choices guard body and parametrize tests

Post-review cleanup (no behavior change):
- Inline the single-use `response_choices` local and drop the redundant
  `if first_choice is not None else None` guard (getattr(None, ...) already
  returns the default safely).
- Collapse the two near-identical empty/None-choices regression tests into
  one `@pytest.mark.parametrize` case.

Mutation-verified: reverting the guard to the old non-empty-list condition
still makes both parametrized cases fail with the historical
'types.SimpleNamespace' object is not iterable.

---------

Co-authored-by: spiky02plateau <155588579+spiky02plateau@users.noreply.github.com>
2026-07-02 06:36:20 +05:30
kshitijk4poor
76be770091 test(moa): assert aux cap against model resolver, not frozen literal
Follow-up to the salvaged fix: the regression test asserted a frozen
max_tokens == 128_000 literal, coupling it to the Opus-4-8 model table.
Assert against _get_anthropic_max_output("claude-opus-4-8") plus > 2000
instead, so the test survives model-table churn while still catching a
regression to the old `or 2000` fallback.
2026-07-02 06:31:18 +05:30
helix4u
7951250947 fix(moa): lift hidden Anthropic aux output cap 2026-07-02 06:31:18 +05:30
kshitij
4d5d9fffd0
Merge pull request #56582 from srojk34/fix/vertex-credentials-env-leak
security(terminal): strip VERTEX_CREDENTIALS_PATH/GOOGLE_APPLICATION_CREDENTIALS from subprocess env
2026-07-02 06:08:55 +05:30
srojk34
7f64cce96d security(vertex): route credential/project/region resolution through the profile secret scope
agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH,
GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw
os.environ.get() instead of the profile-scoped get_secret() every other
credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex
gateway serving several profiles from one process, os.environ still holds
whichever profile's .env python-dotenv loaded at boot — so a raw read here
let one profile's turn silently mint a Vertex OAuth2 token from, and get
billed against, a different profile's GCP service account. No error, no
fail-closed guard: the multiplex UnscopedSecretError protection was bypassed
entirely because these reads never went through get_secret().

- _resolve_credentials_path/_resolve_project_override/_resolve_region now
  call agent.secret_scope.get_secret(), matching the _getenv() pattern
  already used for every other provider's credentials.
- get_vertex_credentials()'s ADC fallback (google.auth.default()) reads
  GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing
  get_secret() entirely — closed with a narrow guard: when multiplexing is
  active and this profile's scope has no Vertex credentials of its own, but
  os.environ still carries a value (left by a different profile's boot-time
  dotenv load), refuse ADC rather than silently authenticate as a stranger.
- Zero behavior change for single-profile installs: get_secret() falls
  through to os.environ transparently whenever multiplexing is off.

Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/
HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential
path.
2026-07-02 06:07:56 +05:30
kshitij
2f7c51a3e2
Merge pull request #56605 from simpolism/codex/discord-inline-bot-mentions
fix(discord): ignore reply-ping-only mentions for bot-authored messages
2026-07-02 05:23:44 +05:30
dsad
830860306d Guard browser CDP on private pages 2026-07-02 05:23:23 +05:30
kshitijk4poor
676236bb1d fix(agent): honor custom CA certs on aux client + harden TLS resolution
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:

- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
  accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
  the main-client TLS resolution (via load_config_readonly, the read-only
  fast path) so compression/vision/web_extract/title-gen/session_search
  honor the same per-provider CA. Without this, chat worked against a
  private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
  instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
  ssl_verify edits are honored on mid-session model switch — matching the
  context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
  (httpx ignores it there); verify lives on the transport. Fix docstrings.
  Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
  agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
  the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
  dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
  entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
  logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
  match, and prefix-bypass rejection.
2026-07-02 04:51:56 +05:30