- ChannelOverride + channel_overrides on PlatformConfig
- Resolve model/runtime: session /model, then channel_overrides, then global
- Thread/parent channel lookup; bridge discord.channel_overrides from YAML
- Drop unrelated test and delegate_tool changes from PR scope
Salvage of #2863 by @aydnOktay, reimplemented against current main using the
existing utils.env_var_enabled / TRUTHY_STRINGS helper instead of per-site
tuple edits. Covers the 7 gateway/config.py env-flag sites that still rejected
'on' (WHATSAPP_ENABLED, SIGNAL_IGNORE_STORIES, MATRIX_ENCRYPTION,
API_SERVER_ENABLED, WEBHOOK_ENABLED, MSGRAPH_WEBHOOK_ENABLED,
BLUEBUBBLES_SEND_READ_RECEIPTS) plus HERMES_DESKTOP gating in
read_terminal/close_terminal. The PR's approval.py HERMES_YOLO_MODE portion is
already on main via is_truthy_value.
delegation.max_concurrent_children is now the single cap for both a
batch's parallelism and concurrent background delegation units.
- _get_max_async_children() delegates to _get_max_concurrent_children();
a leftover max_async_children key logs a one-time deprecation warning
- config v32→33 migration removes the stale key, folding a raised
max_async_children into max_concurrent_children (max wins, no lost
headroom)
- capacity error messages now point at max_concurrent_children
- pool-at-capacity sync fallback now attaches an explanatory note so
the model/user know why the call blocked instead of dispatching async
Previously users who raised max_concurrent_children (e.g. to 15) still
hit the invisible default-3 async cap: the 4th background delegate_task
silently ran inline, blocking the turn with no signal.
CLAUDE_CODE_OAUTH_TOKEN is set and owned by the user's Claude Code
install (subscription OAuth), not a Hermes-managed inference
credential — Claude subscription auth is not a working Hermes provider
path. Blocklisting it broke agent-spawned claude CLIs: with no token in
the child env, claude fell through to the shared macOS Keychain /
~/.claude/.credentials.json store and, on auth failure, cleared it —
logging the user out of their interactive Claude sessions and the
desktop app.
Exempt it from _HERMES_PROVIDER_ENV_BLOCKLIST (it arrives via the
anthropic registry entry, so discard explicitly with rationale).
ANTHROPIC_API_KEY / ANTHROPIC_TOKEN and every other provider credential
remain stripped, and the GHSA-rhgp-j443-p4rf fail-closed passthrough
guard is unchanged for everything still on the blocklist.
Fixes#55878
Follow-up to the #39227 salvage: config refreshes fire mid-session too
(gateway events, settings saves), so applying terminal.cwd
unconditionally would yank the workspace out from under an attached
session. Gate the override on activeSessionIdRef like the sibling
reasoning/tier settings, keep branch refresh on the live cwd, and add
coverage for the active-session path. Also lint-polish the new test
file (typed config mock, prettier formatting).
Follow-up on the salvaged #56392 guard. The cherry-picked change matched
custom:<name> pool entries against the primary by raw base_url string
equality, which (a) can't disambiguate two named custom providers sharing
one gateway base_url and (b) left a latent bare-"custom" entry bypass.
Route the match through get_custom_provider_pool_key(rt[base_url]) compared
against the entry's custom:<name> key, mirroring the sibling guard in
recover_with_credential_pool. Use CUSTOM_POOL_PREFIX instead of the literal.
Add regression tests for the custom same-endpoint (swap) and cross-endpoint
(skip) branches, plus the plain-provider fallback-pool case from #56885.
Two related hardening fixes for auxiliary calls (which include MoA reference
advisors — a pinned-model path where provider fallback is not a meaningful
recovery):
1. Transient-transport retries: the same-provider retry on a connection reset /
timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux
call a second blip silently loses the call (root of the run2 double-advisor
'Connection error' collapse — a genuine upstream blip). Now retries N times
with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3
total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out
preserved.
2. Per-model client-cache isolation: _client_cache_key excluded the model, so
two concurrent auxiliary calls to the same provider/base_url/key but
different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry
and could race each other's client lifecycle. Model now participates in the
key -> distinct clients, no cross-call races. Same-model reuse unchanged.
- agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in
_client_cache_key and both call sites.
- hermes_cli/config.py: auxiliary.transient_retries default (2).
- tests: new retry/isolation tests; updated 2 stale-expectation tests to the
corrected behavior (per-model resolve; N-retry escalation).
Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.
Attribution audit gate: the salvaged contributor commits carry
kiljadn@gmail.com (Nick Mason / @designnotdrum). Add the mapping so
contributor_audit.py resolves the author on this PR.
Follow-up on the #56480 salvage: the include_registry=False docstring said
None is returned only for registry/MCP-only toolsets; it also applies to
registry-derived aliases, which have no static TOOLSETS counterpart.
_get_platform_tools reverse-maps a platform composite to configurable
toolsets with an all-tools subset test. Because get_toolset() merges
registry-registered tools into a toolset, a tool added to a toolset
(delegate_cli -> delegation; desktop-only read_terminal -> terminal) that the
static composite never listed made the subset test fail, silently dropping the
entire toolset on api_server and other inference-based platforms. Compare the
toolset's static membership at all three reverse-map sites.
Fixes#49622.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds include_registry=True kwarg to resolve_toolset/get_toolset. When False,
returns only the static TOOLSETS view with no registry-merged tools — the
composite-authored membership platform reverse-mapping must compare against.
Default True preserves all existing behavior; this is the enabling half of
the api_server toolset-drop fix (#49622).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extends the browser private-network eval guard to the Camofox backend.
On main, _browser_eval() returned early in Camofox mode before running the
shared private-URL literal pre-scan and before re-checking the page URL
after eval, leaving Camofox as a sibling backend that could execute
browser_console(expression=...) against private/internal targets.
- move the eval private-URL literal pre-scan before the Camofox early return
- add a Camofox current-page private-URL probe via the evaluate endpoint
- withhold Camofox eval results when the page is now private/internal
Follow-up to browser private-network hardening in #56173, #56526, #56664.
Salvage of #56764 by @rayjun (rayoo), cherry-picked to preserve authorship.
The fake _SessionStore tracked peer_records but no test read it, leaving
#55300's peer-record behavior unasserted. Add a positive assertion on the
persist path and negative (== []) assertions on the two stale/moved-binding
skip paths, so the peer-record side effect is bound.
Mutation-verified: removing the production _record_gateway_session_peer call
makes the positive assertion fail.
Co-authored-by: João Vitor Cunha <jvsantos.cunha@gmail.com>
The #55300 peer-recording call now fires on the failed-turn compression
split path; the fake _SessionStore in test_compression_failure_session_sync
(carried in with #55721's test changes) lacked that method. Add a
call-tracking no-op so the combined salvage's tests pass.
Co-authored-by: João Vitor Cunha <jvsantos.cunha@gmail.com>
Manual /compress built a temporary AIAgent without the originating
platform / stable gateway session key, so an external context engine
ingested the retained transcript tail as source=cli during /compress
and again as the real platform on resume (duplicate cli,telegram rows).
Pass platform=_platform_config_key(source.platform) + the in-scope
gateway_session_key, mirroring the normal gateway turn. Assigned into
runtime_kwargs (single-valued, authoritative) so they neither collide
into a duplicate-kwarg TypeError nor lose to a stale resolver value.
Fixes#50422.
Map the two contributor emails whose commits are cherry-picked into the
compression-routing-integrity salvage so scripts/contributor_audit.py
attributes them at release time:
- jvsantos.cunha@gmail.com -> plcunha (PR #55300)
- jakepresent1@gmail.com -> jakepresent (PR #55721)
r266-tech (PR #50517) is already mapped.
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.
Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).
E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.
- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
Ben caught that the initial approach (widening _NOUS_PORTAL_ALLOWED_HOSTS to
include the staging host) was the wrong fix -- env vars are supposed to
override the allowlist, mirroring how NOUS_INFERENCE_BASE_URL already
bypasses _ALLOWED_NOUS_INFERENCE_HOSTS via _nous_inference_env_override().
The actual bug: both resolve_nous_access_token and
resolve_nous_runtime_credentials read
`_optional_base_url(state.get("portal_base_url")) or os.getenv(...) or ...`
-- a plain `or` chain where the STORED state value wins first (short-circuits
before the env vars are even read), and then whichever value won gets run
through the same _NOUS_PORTAL_ALLOWED_HOSTS gate regardless of its source.
So a hosted agent stamped with HERMES_PORTAL_BASE_URL=<staging> in its env
AND a staging portal_base_url already persisted to auth.json would still
get silently rewritten to prod on every refresh, because the env var never
even got a chance to be consulted.
Revert the previous _NOUS_PORTAL_ALLOWED_HOSTS widening entirely --
staying prod-only preserves the allowlist's actual job (rejecting an
untrusted network-provided portal_base_url persisted to auth.json by a
compromised Portal response).
Add _nous_portal_env_override() (mirrors _nous_inference_env_override())
and restructure both call sites so the env override is checked FIRST and,
when set, wins outright and skips the allowlist gate entirely -- the
allowlist only ever runs against the fallback (stored-state-or-default)
path now.
Rewrote tests/hermes_cli/test_nous_portal_staging_allowlist.py to test the
actual fix: the helper function, and an end-to-end
resolve_nous_access_token proof that the env override wins even when state
ALSO has the staging host stored (the exact incident shape), that it wins
over a stored PROD host too, and that the allowlist's heal-to-prod
behaviour for an untrusted stored value is preserved when no override is
set.
* fix(streaming): handle completed responses with empty/None choices
The streaming fallback guard added in #55932 recognized a completed
response object only when its `choices` was a non-empty list. But an
adapter can return a completed response whose `choices` is `None` or an
empty list (an error / content-filter / terminal frame) — still a whole,
non-iterable response, not a token stream. Those shapes fell through to
`for chunk in stream` and crashed with
'types.SimpleNamespace' object is not iterable
which is exactly issue #55933 (MoA `openai-codex` aggregator on
TUI/Desktop, where a stream consumer forces the streaming path).
Broaden the guard to discriminate on the PRESENCE of a `choices`
attribute (a genuine provider Stream object exposes none), disable
streaming for the session, and return the completed object so the outer
loop's normal invalid-response validation handles empty/None choices via
its retry path instead of iterating.
Based on the diagnosis in #56525 by @spiky02plateau (that PR normalized
the MoA aggregator return with a one-shot chunk iterator; the common
text/tool-call crash was already fixed at this seam by #55932, so this
extends the existing guard to cover only the remaining empty/None-choices
gap).
Fixes#55933
* refactor(streaming): simplify empty-choices guard body and parametrize tests
Post-review cleanup (no behavior change):
- Inline the single-use `response_choices` local and drop the redundant
`if first_choice is not None else None` guard (getattr(None, ...) already
returns the default safely).
- Collapse the two near-identical empty/None-choices regression tests into
one `@pytest.mark.parametrize` case.
Mutation-verified: reverting the guard to the old non-empty-list condition
still makes both parametrized cases fail with the historical
'types.SimpleNamespace' object is not iterable.
---------
Co-authored-by: spiky02plateau <155588579+spiky02plateau@users.noreply.github.com>
Follow-up to the salvaged fix: the regression test asserted a frozen
max_tokens == 128_000 literal, coupling it to the Opus-4-8 model table.
Assert against _get_anthropic_max_output("claude-opus-4-8") plus > 2000
instead, so the test survives model-table churn while still catching a
regression to the old `or 2000` fallback.
agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH,
GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw
os.environ.get() instead of the profile-scoped get_secret() every other
credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex
gateway serving several profiles from one process, os.environ still holds
whichever profile's .env python-dotenv loaded at boot — so a raw read here
let one profile's turn silently mint a Vertex OAuth2 token from, and get
billed against, a different profile's GCP service account. No error, no
fail-closed guard: the multiplex UnscopedSecretError protection was bypassed
entirely because these reads never went through get_secret().
- _resolve_credentials_path/_resolve_project_override/_resolve_region now
call agent.secret_scope.get_secret(), matching the _getenv() pattern
already used for every other provider's credentials.
- get_vertex_credentials()'s ADC fallback (google.auth.default()) reads
GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing
get_secret() entirely — closed with a narrow guard: when multiplexing is
active and this profile's scope has no Vertex credentials of its own, but
os.environ still carries a value (left by a different profile's boot-time
dotenv load), refuse ADC rather than silently authenticate as a stranger.
- Zero behavior change for single-profile installs: get_secret() falls
through to os.environ transparently whenever multiplexing is off.
Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/
HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential
path.
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:
- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
the main-client TLS resolution (via load_config_readonly, the read-only
fast path) so compression/vision/web_extract/title-gen/session_search
honor the same per-provider CA. Without this, chat worked against a
private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
ssl_verify edits are honored on mid-session model switch — matching the
context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
(httpx ignores it there); verify lives on the transport. Fix docstrings.
Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
match, and prefix-bypass rejection.
Wire ssl_ca_cert and ssl_verify through custom_providers config and env
vars into the keepalive httpx client, fixing APIConnectionError against
mkcert/self-signed Ollama proxies behind HTTPS.
Desktop chat bubbles render plain text, but a worker-routed command that
builds its own Rich Console (e.g. /journey) picks up truecolor from the
gateway's inherited COLORTERM and leaks raw escapes into the bubble. Strip
ANSI at the single worker-return choke point so every command renders cleanly.
The TUI opens /journey as an overlay, so it never travels this path.
In the interactive CLI, /journey dispatched straight to `args.func(args)`,
letting Rich write ANSI to stdout — which patch_stdout's StdoutProxy passes
through as literal `?[38;2;…m` garbage. Route the read-only views (default +
`list`) through a captured, force-color Console and re-emit via `_cprint`
(prompt_toolkit's ANSI parser), matching the `ChatConsole` idiom.
`delete`/`edit` stay on real stdio since they prompt / open `$EDITOR`.
parse_frontmatter's malformed-YAML fallback stores every value as a string,
so a skill's `metadata` can be a str. `_category`/`_related` chained
`.get("metadata", {}).get("hermes", {})` and blew up with `'str' object has
no attribute 'get'`, taking down `build_learning_graph()` (and thus /journey
and `hermes journey`) whenever any installed skill had bad frontmatter.
Extract a `_hermes_meta()` helper that returns the nested dict only when it
really is one. Fixes the whole class, not just the two call sites.
- claude-fable-5 placed above claude-opus-4.8 in both curated lists
- claude-sonnet-5 replaces claude-sonnet-4.6
- sakana/fugu-ultra added near the bottom (before routers/free tier)
- regenerated website/static/api/model-catalog.json via scripts/build_model_catalog.py (live-pulled by CLI, published on merge — no release needed)
Two Hermes bots sharing a channel could volley replies at each other
indefinitely. Root cause: Discord reply-pings (allowed_mentions
replied_user=true) add the replied-to bot to message.mentions without a
literal <@bot> token in the body, so the existing bot-admission gate
treated a reply chip as an explicit @mention and re-triggered the peer.
Adds opt-in discord.bots_require_inline_mention (default false; env
DISCORD_BOTS_REQUIRE_INLINE_MENTION). When enabled, bot-authored
messages must carry a raw inline <@id>/<@!id> mention in the content;
reply-ping-only mentions no longer admit the message. Human messages and
all existing defaults are unchanged.
The new _self_is_raw_mentioned helper deliberately ignores the resolved
message.mentions list (which reply-ping populates) and checks only the
raw content token via the shared _raw_mentioned_user_ids primitive.
Vertex AI authenticates via OAuth2 (service-account JSON path / ADC), not
PROVIDER_REGISTRY, and VERTEX_CREDENTIALS_PATH is declared with
password=False (it's a path, not a bare key) under category="provider" —
a category the registry-derived blocklist loop never checks. Both it and
GOOGLE_APPLICATION_CREDENTIALS (the ADC fallback the adapter also reads)
fell through every existing blocklist source and leaked the on-disk
location of a GCP service-account key into every spawned subprocess
(terminal, codex/copilot app-server, browser workers) — the same leak
class already closed for every other provider's credentials in #53503.
Follow-up review fixes on the salvage of #54872 (原作者 张满良/@zmlgit):
1. [HIGH] Adapter selection now goes through the shared
_authorization_adapter chokepoint (gateway/authz_mixin.py) instead of a
local inline lookup that fell back to the DEFAULT profile's same-platform
adapter when the owning profile had a registry entry but no adapter for
that platform. That fallback re-introduced the exact cross-profile
mis-delivery ([230002] Bot can NOT be out of the chat) this change exists
to fix. Adds a mutation-verified guard test
(test_notifier_owning_profile_adapter_no_default_fallback).
2. [HIGH→documented] The creator-wake SessionSource cannot faithfully
reconstruct a DM/thread creator's session key because chat_type is neither
persisted on the subscription nor carried on the session-context bridge.
Documented the limitation inline; behavior degrades to a fresh group
session (never an exception). The end-to-end fix (stamp + persist
chat_type) is a scoped follow-up, not bundled into this salvage.
3. [MED] Documented that archived/unblocked are intentionally claimed (cursor
hygiene) but silent, and excluded from wake kinds.
4. [MED] Wake-injection failure now logs at WARNING with exc_info=True (the
cursor has already advanced, so a broken wake must not be a silent no-op).
Addresses @tonydwb's review on PR #54872 (12:05 UTC, 2026-06-29):
> the hardcoded Chinese text in the wake messages (lines 118-128 of
> the diff) should be replaced with English or internationalized.
> The rest of the codebase uses English for user-facing messages,
> and hardcoded Chinese will confuse non-Chinese users. Consider
> using a constants dict or the existing i18n infrastructure.
Used the existing i18n infrastructure (agent/i18n.py::t()) — the same
surface gateway/run.py and slash_commands.py already use for static
user-facing strings.
## Changes
- gateway/kanban_watchers.py: import `t` from agent.i18n; replace the
hardcoded Chinese strings in the synthetic wake-up message with
t("gateway.kanban.wake.*") lookups. Behavior unchanged for zh users
(zh catalog preserves the original Chinese phrasing).
- locales/en.yaml: new `gateway.kanban.wake.*` baseline keys (English):
completed / gave_up / crashed / timed_out / blocked / status_default
/ status_joiner / message (with {task_id} {status} {title}
{assignee} {board} placeholders).
- locales/zh.yaml: Chinese translation of the new keys, preserving the
exact wording the original code used (so existing zh users see no
visible change).
- locales/{zh-hant,ja,de,es,fr,tr,uk,af,ko,it,ga,pt,ru,hu}.yaml: added
the same key set with English fallback values. The i18n invariant
test (tests/agent/test_i18n.py::test_catalog_keys_match_english)
requires every catalog to carry the same key set as en.yaml; native
translations can land incrementally without breaking users (the
loader falls back to en.yaml per-key when a translation is missing,
but the key must still exist).
## Verification
- scripts/run_tests.sh tests/agent/test_i18n.py
tests/gateway/test_kanban_watchers_mixin.py
tests/gateway/test_kanban_notifier.py
tests/gateway/test_kanban_notifier_watcher_dispatch_gate.py
→ 60 passed, 0 failed (i18n catalog parity + placeholders parity +
existing kanban notifier behavior).
- Manual: with HERMES_LANGUAGE=en, t("gateway.kanban.wake.completed")
returns "completed"; with HERMES_LANGUAGE=zh, returns "已完成";
with HERMES_LANGUAGE=ja (translation pending), falls back to
"completed" per-key.
Three connected changes that fix kanban notifications in multiplex_profile
gateways and enable event-driven agent collaboration:
1. Session profile propagation
- Add HERMES_SESSION_PROFILE ContextVar (session_context.py)
- Gateway stamps source.profile at dispatch time (run.py)
- _maybe_auto_subscribe reads profile from ContextVar instead of
os.environ which is unset in the gateway main process (kanban_tools.py)
2. Notifier profile-aware routing (kanban_watchers.py)
- Adapter selection: prefer _profile_adapters[sub.notifier_profile]
so each profile's bot delivers its own task notifications
- Relax profile skip-filter: process cross-profile subscriptions when
the gateway has an adapter for the owning profile
- Extend TERMINAL_KINDS with status/archived/unblocked
3. Creator agent wakeup on terminal events (kanban_watchers.py)
- After delivering completed/blocked/gave_up/crashed/timed_out
notifications, inject a synthetic MessageEvent into the creator's
session via adapter.handle_message to trigger their agent loop
- SessionSource built from subscription metadata — no session_store
lookup needed