The /codex-runtime slash command short-circuits with "openai_runtime
already set" when invoked with the same value as the current config,
and crucially skips the entire migration block below. The check
conflates two things: (a) "the config value is correct" and (b) "the
world state (managed block in ~/.codex/config.toml, hermes-tools MCP
callback, plugin discovery) is converged".
Common footgun this exposes: a user who pre-sets
`model.openai_runtime: codex_app_server` directly in config.yaml
(reasonable thing to do) and then runs /codex-runtime codex_app_server
to trigger migration sees "already set" and silently gets no migration.
~/.codex/config.toml never receives the managed block, the hermes-tools
MCP callback never registers, and codex falls through to its default
runtime instead of the app-server one — visibly successful but
functionally partial setup.
The migration is idempotent by design (it replaces its own managed
block in place between MIGRATION_MARKER and MIGRATION_END_MARKER), so
re-running it is safe and cheap. Fix the short-circuit to fall through
to migration when re-applying codex_app_server while skipping the
config persist (no value-level change needed). The disable case
(re-applying "auto") still short-circuits because disabling doesn't
touch ~/.codex/config.toml at all.
The user-visible message changes to "openai_runtime already set to
codex_app_server — re-applying migration" so re-runs surface what
happened.
Regression test (test_reapply_codex_app_server_runs_migration) asserts:
- migrate() was called when re-applying
- persist_callback was NOT called (no config write on no-op transitions)
- migration output (MCP servers, sandbox default) surfaces in the
user-visible message
- requires_new_session is True so callers know to /reset
Verified RED→GREEN: the test fails on origin/main with
"migration must run on reapply, not just first enable" and passes with
this fix. Full test_codex_runtime_switch.py suite: 31 passed.
Two independent MoA auxiliary-call fixes:
#53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout
were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long
reference/aggregator turn (mixed providers, deep reasoning, long tool chains)
has headroom instead of being cut mid-generation.
#53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by
the MoA acting-aggregator, compression, web_extract, session_search, etc.)
never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses
transport (agent/transports/codex.py) was warm. Derive the same
content-addressed key via the shared _content_cache_key(instructions, tools)
helper and set it on the aux Responses request, with the same host guards the
main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out
of cache-key routing).
Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical
prefix, differs on different instructions, skipped for xai/github hosts).
tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py
130 pass.
The provider-parity contract (tests/hermes_cli/test_provider_parity.py)
requires every hermes model provider to be configurable in the desktop
Providers tabs. Vertex authenticates via OAuth2 (service-account JSON /
ADC) and has no api_key_env_vars, so — like bedrock's aws_sdk — it needs
its credential env var tagged to the provider card explicitly. Tag
VERTEX_CREDENTIALS_PATH to the vertex card in _catalog_provider_env_metadata().
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).
- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
(5-min margin), ADC->service-account fallback, global vs regional
endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
credentials-file path is never mistaken for a static API key; mints a
fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
re-mints the token and rebuilds the client on a mid-session 401, so a
long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
runtime resolution.
Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
- Track auth store source path on Nous state reads and write rotated
OAuth refresh tokens back to the same store, preventing stale-token
replays when Hermes falls back to a global/root auth.json.
- Skip Nous fallback entries locally when no access/refresh token is
present, suppressing repeated failed resolution attempts within a
session.
- Sync session model metadata after fallback switches so the gateway
DB reflects the backend that actually served the latest turn.
Add policy gates and output redaction for browser/CDP surfaces, strengthen session ownership tracking, and block credential-like query parameters before third-party browser/web backends receive URLs.
Inspired by the agbrowse review: keep local browser magic-link flows possible while preventing cloud reader/browser escalation from receiving opaque token, code, signature, or key query parameters.
The salvaged PR guarded only resolve_nous_access_token; the primary
resolve_nous_runtime_credentials path also POSTs the refresh token to
portal_base_url on refresh with no allowlist check. Mirror the guard
there so a poisoned host can't receive the bearer, and drop the stray
duplicated allowlist comment. Adds a sibling-site regression test.
generate_systemd_unit() and generate_launchd_plist() used
Path(shutil.which('node')).resolve().parent to find the node bin dir.
When ~/.local/bin/node is a symlink into a specific profile's node
install (e.g. ~/.hermes/profiles/<p>/node/bin/node), .resolve() chases
it and bakes that one profile's path into EVERY profile's service
definition.
This breaks profile isolation and makes systemd_unit_is_current()
perpetually False: each gateway rewrites its unit + daemon-reload on
every boot, destabilizing multi-profile setups into a ~5-minute restart
loop (observed NRestarts ~1600 across two gateways).
Fix: use Path(resolved_node).parent — the directory where node is found
on PATH — instead of chasing the symlink to its resolved target. This
keeps generated service definitions profile-agnostic.
Affects both the systemd (Linux) and launchd (macOS) unit generators.
compress_context() and /new already flush un-persisted messages before
calling end_session() (fixed in #47202), but /resume and /branch still
call end_session() directly. When a turn is interrupted mid-flight and
the user immediately runs /resume or /branch, messages generated during
that turn have not yet been written to state.db and are silently lost on
session rotation.
Add the same best-effort _flush_messages_to_session_db() call before
end_session() in both _handle_resume_command and _handle_branch_command,
mirroring the pattern established in cli.py:new_session().
Regression tests verify the flush is called when an agent is present.
Reworks @valenteff's #53277 fix per review (Teknium's 3 findings):
- Route refresh_launchd_plist_if_needed's bootstrap through the existing
_launchctl_bootstrap() EIO-recovery helper (canonical since #56256),
wrapped in a wall-clock retry loop, instead of an ad-hoc 5x2s loop.
- Window sized to agent.restart_drain_timeout (default 180s), not a fixed
~10s: the failure happens while the old gateway is still draining (finding 1).
- Retry on subprocess.TimeoutExpired too, not just CalledProcessError — a
bootstrap timeout after bootout otherwise escapes and leaves the service
unloaded (finding 2).
- Confirm success with launchctl list, not a bare bootstrap exit 0 (finding 3);
mirror verify+drain-window in the detached-helper bash path.
- Shared helpers _launchd_reload_log_path / _append_launchd_reload_log /
_launchctl_label_registered / _retry_launchctl_bootstrap_until_registered.
3 new tests cover retry-until-listed, TimeoutExpired-retried, deadline-exhaust.
E2E: real reload log + mocked launchctl — retries CalledProcessError+TimeoutExpired,
verifies via launchctl list, logs failures.
refresh_launchd_plist_if_needed ran `launchctl bootout` then
`launchctl bootstrap` with errors silenced (`2>/dev/null` in the
detached helper, `check=False` in the direct subprocess path).
Under high load or a launchd race, the bootout succeeds — removing
the service from launchd — but the follow-up bootstrap fails
silently. The service stays unregistered; KeepAlive can't revive
a service launchd no longer knows about, so the gateway stays dark
until a manual `launchctl bootstrap`.
Observed incident (2026-06-26): `/restart` in chat triggered a
planned drain; during the drain a separate call re-triggered the
plist refresh, which bootout'd the live service. Under loadavg
9.48 the bootstrap failed silently — 2h35min offline until manual
recovery.
Fix: retry the bootstrap up to 5 times with 2s back-off, verify
with `launchctl list <label>` afterwards, and log failures to
~/.hermes/logs/launchd-reload.log so the health watchdog can
detect a persistent orphan. Mirrors the contract across both
the detached helper (refresh inside gateway tree) and the direct
subprocess path (refresh from external CLI).
Existing tests pass:
- test_refresh_defers_reload_when_running_inside_gateway_tree
- test_refresh_uses_direct_reload_when_not_inside_gateway_tree
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow-up on the salvaged #47491 commits:
- Register _plugin_api_runtime_gate BEFORE the auth middlewares so it
executes AFTER them, and add an explicit auth check: unauthenticated
requests to /api/plugins/<name>/ fall through to auth's 401 instead of
this gate's 404. Prevents the gate from becoming a plugin-name oracle
(an unauthenticated caller could otherwise fingerprint installed/enabled
plugins by status code). Keeps test_non_kanban_plugin_route_requires_auth
green.
- Enable the 'example' user plugin in the _install_example_plugin test
fixture so the auth / static-asset-allowlist tests still reach the real
serving paths now that user plugins are gated on plugins.enabled.
- Mark the runtime-gate unit-test scopes as authenticated so they exercise
the enabled/disabled policy under the new auth-first ordering.
Address two residual bypasses identified in review:
1. Add _plugin_api_runtime_gate middleware that checks plugins.enabled/
plugins.disabled on every request to /api/plugins/{name}/... routes.
Previously, disabling a plugin at runtime had no effect on its already-
mounted API routes until a restart.
2. Extend serve_plugin_asset to check plugins.disabled for bundled plugins.
Previously, only user plugins were gated — a bundled plugin in
plugins.disabled would still serve assets from the unauthenticated
/dashboard-plugins/{name}/... endpoint.
Both fixes ensure the enabled/disabled policy is evaluated live at request
time, not just at startup.
Adds regression tests covering:
- Middleware blocks disabled user plugin API routes (404)
- Middleware blocks user plugin removed from enabled set (404)
- Middleware passes enabled user plugin API routes
- Middleware blocks disabled bundled plugin API routes (404)
- Bundled plugin assets return 404 when disabled
- Bundled plugin assets served normally when not disabled
- User plugin asset gating still works correctly
User-installed dashboard plugins had their assets served and Python
backend code imported without checking the plugins.enabled allowlist.
This meant a plugin installed in the plugins directory but not enabled
could still execute code at dashboard startup and serve arbitrary files.
Changes:
- get_dashboard_plugins API: filter out user plugins not in enabled set
- serve_plugin_asset: reject requests for disabled/non-enabled user plugins
- _mount_plugin_api_routes: skip Python import for non-enabled user plugins
- Bundled plugins still load by default but respect explicit disables
Fixes#46435
On macOS, `launchctl bootstrap` of a label still registered in the domain
fails with 5: Input/output error (EIO). That is the *already loaded* case — a
stale registration from an interrupted restart or a bootout that didn't settle
— recoverable by booting the leftover out and bootstrapping again, and distinct
from the domain being genuinely unmanageable.
launchd_install and launchd_start (both bootstrap paths) treated exit 5 as
'launchd cannot manage this macOS version' and silently degraded to a detached
process, losing auto-start at login and crash-restart. Centralize bootstrap in
_launchctl_bootstrap(), which on EIO boots the stale label out and retries once;
only if the retry also fails does the error propagate so callers apply their
existing _launchctl_domain_unsupported fallback for a genuinely broken domain.
launchd_restart already boots out before bootstrapping (its drained job is
almost always still registered, so a plain bootstrap would hit EIO on the common
path), so it keeps its explicit pre-bootout rather than routing through the
bootstrap-first helper. Corrected the stale exit-5 comment that claimed it
always meant an unmanageable domain.
Adds TestLaunchctlBootstrapEioRetry covering clean bootstrap (no bootout),
EIO -> bootout -> retry success, persistent EIO re-raise, and non-EIO re-raise
without a spurious bootout.
Completes the #30719 restart-loop defenses. Defenses 1-2 (the
_HERMES_GATEWAY guard on `hermes gateway stop|restart` + terminal_tool,
and the cron-creation lifecycle filter) already landed on main, but two
gaps remained:
- The agent's `cronjob` model tool calls cron.jobs.create_job directly,
bypassing the hermes_cli.cron.cron_create CLI filter, so lifecycle
commands scheduled via the model tool were only blocked at execution
time (terminal_tool), not at creation. Moved the filter to a shared
cron/lifecycle_guard.py enforced at create_job — the single chokepoint
every job-creation path hits (CLI + model tool). Re-exported
_contains_gateway_lifecycle_command from hermes_cli.cron so
terminal_tool's import keeps working.
- No breaker for the auto-resume loop itself. Defenses 1-2 cover the
cron/CLI/terminal paths, but any other SIGTERM source (e.g. a raw
terminal("launchctl kickstart ai.hermes.gateway")) still triggers the
boot->auto-resume->re-run cycle. Added gateway/restart_loop_guard.py:
counts restart-interrupted boots in a rolling window (config
gateway.restart_loop_guard, default 3 boots / 60s) and skips
auto-resume for that boot once tripped. The gateway still comes up and
serves real inbound messages; it just stops replaying the session that
keeps killing it, putting a human back in the loop.
Also tightened the lifecycle regex over main's version: dropped
`hermes gateway start` (benign), required the gateway identifier on the
launchctl/systemctl branches (so `launchctl unload
ai.hermes.update-checker.plist` and `systemctl restart
hermes-meta.service` no longer false-positive), added the inverse
pkill token order, and fixed the binary-script bypass (decode with
errors='replace' instead of swallowing UnicodeDecodeError). The
create_job guard resolves relative script paths under HERMES_HOME/scripts
the same way the scheduler does, so a bare script name is scanned as the
file that actually runs.
Design and much of defense-2 originate from PR #33395 (@kshitijk4poor),
which itself salvaged #30728 (@SimoKiihamaki). Rebuilt against current
main since defenses 1-2 had already landed under different names.
Closes#30719.
Co-authored-by: SimoKiihamaki <simo.kiihamaki@gmail.com>
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
The credential-pool Codex refresh path synced tokens from auth.json and
then POSTed the refresh_token to OpenAI's token endpoint without holding
the cross-process auth-store lock across the whole read->POST->write-back
sequence. Because Codex refresh tokens are single-use, two concurrent
Hermes processes could both adopt the same on-disk token and both POST
it; the loser got refresh_token_reused / invalid_grant.
Wrap the Codex OAuth branch of _refresh_entry in the existing shared
_auth_store_lock (reentrant, cross-process flock) using the same
extended-timeout pattern resolve_codex_runtime_credentials() already
uses. A waiting process now blocks on the lock and, once inside, the
in-lock re-sync picks up the rotated token the winner persisted and
skips its own POST. Also send User-Agent: hermes-cli/<version> on the
refresh request.
Credit @cooper-oai (#34820) for identifying the concurrent-refresh
reuse race; this ships the narrow lock-serialization fix without the
separate Codex auth-store partition.
`_detect_api_mode_for_url` previously returned `None` for the bare
`api.anthropic.com` host, causing every URL-fallback path
(custom_providers, direct-alias, the api-key fallback inside
`resolve_runtime_provider`) to default to `chat_completions` for
native Anthropic — which routes requests to the OpenAI-compat
`/chat/completions` shim instead of the native `/v1/messages`
endpoint.
Pro/Max OAuth subscriptions are only billed against the native
Messages API; the shim bills against a separate "extra usage" pool
that is empty by default, so a freshly authorized Pro/Max credential
400s with "You're out of extra usage" the moment it's used — even
on an account that has consumed nothing for the current cycle.
Brings the helper in line with `hermes_cli.providers.determine_api_mode`
which already mapped `api.anthropic.com` to `anthropic_messages`.
Upstream #52270 added `_nous_inference_env_override()` but wired it into
only `resolve_nous_runtime_credentials`. Three sibling resolution paths
still ignored the override, so a self-hosted Nous inference endpoint set
via `NOUS_INFERENCE_BASE_URL` was silently dropped whenever credentials
arrived through any of them:
- the credential-pool path (`_resolve_runtime_from_pool_entry`)
- the explicit-provider path (`_resolve_explicit_runtime`)
- the auxiliary side-LLM client (`_pool_runtime_base_url`)
Route all three through the same auth-layer reader so every
`NOUS_INFERENCE_BASE_URL` read shares one normalization path
(trailing-slash stripping, blank -> empty) and the documented
trusted-bypass intent stays in one place. The override is live-only: it
wins for the base URL returned this run but is never persisted to
auth.json or the credential pool, so an ephemeral dev/staging value
cannot poison durable auth state.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## What does this PR do?
A single, perfectly valid `.env` line was being silently corrupted on read
and write. When a secret's value happened to contain a known Hermes env var
name followed by `=` — for example a webhook or proxy base URL carrying a
query parameter like `OPENAI_BASE_URL=https://proxy.example.com/v1?TAVILY_API_KEY=sk-...`
— `_sanitize_env_lines()` treated the embedded `KEY=` as a second entry. It
truncated the real secret at the inner match and fabricated a bogus second
variable. A related path silently dropped any text before the first matched
key. Because this runs on every `load_env()`, `save_env_value()`,
`remove_env_value()` and `sanitize_env_file()`, the damage was written back to
`~/.hermes/.env` and re-applied on every read — persistent loss/corruption of
the canonical secrets store.
The concatenation splitter now only acts when the line actually begins with a
known `KEY=` (so leading text is never dropped) and when every value that
precedes a boundary is a plain token. If a preceding value looks structured —
a URL/query string (`://`, `?`, `&`) or contains whitespace — the embedded
`KEY=` is understood to be part of that value, and the line is kept verbatim.
Genuine concatenations of plain-token secrets still split as before.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
## Changes Made
- `hermes_cli/config.py`: added `_looks_like_structured_value()` helper and
reworked the split logic in `_sanitize_env_lines()` to anchor splits to the
line start and skip splitting when a preceding value looks like a URL/query
string or holds whitespace.
- `tests/hermes_cli/test_config.py`: added two regression tests — a value that
embeds a known `KEY=` is preserved verbatim, and leading text before the
first key is not dropped.
## How to Test
1. Run the sanitizer tests: `pytest tests/hermes_cli/test_config.py -k anitize -q`.
2. Confirm the new cases reproduce the bug on the old code and pass on the new:
`OPENAI_BASE_URL=https://proxy.example.com/v1?TAVILY_API_KEY=sk-embedded`
is returned unchanged instead of being split into a truncated value plus a
fabricated `TAVILY_API_KEY` entry.
3. Run the full file: `pytest tests/hermes_cli/test_config.py -q` (97 passed).
## Checklist
### Code
- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
hermes doctor's final 'configure missing API keys' summary counted every
toolset with unmet key requirements, including default-off and explicitly
disabled ones. Filter the summary to toolsets actually enabled for the CLI
platform, with a graceful fallback to prior behavior when config resolution
fails.
Fixes#11336
Two live cron bugs, both surfaced by @banditburai in #35616 (whose larger
watchdog/supervisor work is already superseded by the CronScheduler provider
refactor on main):
- #32896: `cron list` crashed on a present-but-null `deliver` field —
`job.get("deliver", ["local"])` returns None for an explicit null, which
then hit `", ".join(None)`. Coalesce with `or ["local"]` (same pitfall
the sibling `repeat` line already guards against).
- #33465: cron jobs 401'd on Bitwarden/BSM-backed secrets. The per-run env
reload used a bare `load_dotenv(override=True)`, which re-applied only the
.env placeholder — startup had already recorded this HERMES_HOME in
env_loader._APPLIED_HOMES, so the external-secret re-pull no-oped. Route the
reload through load_hermes_dotenv() and call reset_secret_source_cache()
first to force the re-pull (Bitwarden's 300s value-cache keeps it off the
network; override honours secrets.bitwarden.override_existing, mirroring
startup).
Tests: null-deliver regression guard in test_cron.py; reset-before-reload
ordering guard in test_scheduler.py. Migrated 31 scheduler-reload test seams
from patching dotenv.load_dotenv to the new load_hermes_dotenv /
reset_secret_source_cache seam.
`hermes debug share` printed a privacy notice and then uploaded the
report to a public paste service in the same breath — the user never got
to say yes or no. Add a consent gate: an interactive [y/N] prompt, a
--yes/-y flag to skip it, and a hard refusal (exit 1) in non-interactive
contexts (no TTY on stdin) so debug data can't be exposed silently in
scripts/CI.
- New _confirm_upload() helper gates the actual upload after the notice.
- Applied to BOTH upload paths: the public paste.rs path and the --nous
Nous-S3 path (the latter is a sibling site the original PR missed).
- The /debug slash command passes yes=True (typing /debug is itself the
consent action, and input() would hang inside prompt_toolkit).
- Rewrote the privacy notice for accuracy: secrets (API keys/tokens/
passwords) ARE force-redacted before upload; PII (display name,
platform user ID, verbatim message content, filesystem paths) is NOT,
and that URL is public.
Fixes#22016.
Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>
Adds moa.save_traces (default off). When on, every MoA turn that runs the
reference fan-out appends one JSON line to
<hermes_home>/moa-traces/<session_id>.jsonl capturing the TRUE FULL turn:
each reference model's exact input messages (system advisory prompt + full
advisory view, not the truncated display preview) + full output + usage +
per-advisor cost, and the aggregator's exact input (including the injected
reference-context guidance block) + output. Lets MoA runs be audited and
improved offline — what every model saw, said, and cost.
- agent/moa_trace.py: config-gated JSONL writer, profile-aware path via
get_hermes_home(), best-effort (never breaks a turn), moa.trace_dir override.
- agent/moa_loop.py: _RefAccounting now carries full input/output/model/
provider/temperature; create() stashes the full turn on a cache MISS
(once per turn, never on the cache-HIT repeat iterations); non-streaming
aggregator output captured inline, streaming marked + pointed at the
session assistant message. consume_and_save_trace(session_id) flushes it.
- agent/conversation_loop.py: flushes the trace with the live session_id
right after MoA usage consumption. No-op for non-MoA clients.
- hermes_cli/config.py: moa.save_traces + moa.trace_dir defaults.
Traces are a side channel — NOT the messages table, never in replay, safe
to delete. Off by default; only overhead when off is one config read on a
MoA cache-MISS turn.
Tests: full-trace-when-enabled (per-ref input+output+cost, aggregator
input-with-guidance + output), nothing-when-disabled. Live E2E through
run_conversation confirmed the loop wiring writes the file.
The --nous flag was only wired into the argparse `hermes debug share`
subcommand. The /debug slash command (classic CLI + TUI, both via
process_command -> _handle_debug_command) built a hardcoded args
namespace with no `nous` attribute, so it always took the default
paste.rs path.
Pass cmd_original through to _handle_debug_command and parse an optional
destination word:
/debug -> public paste (default, unchanged)
/debug nous -> Nous-internal S3
/debug local -> stdout, no upload
local wins over nous (never touches the network); unknown words fall
back to the default. Add args_hint="[nous|local]" so help/autocomplete
surface it. New TestDebugSlashCommand covers the parsing + dispatch.
NAS PR #349 (merged) ships a stateless presigned-PUT endpoint: the only
route is POST /api/diagnostics/upload-url, and the object's existence in S3
is the only state. There is no /api/diagnostics/confirm route — confirming
live against the merged preview returns 404.
The client's confirm_upload() therefore fired a guaranteed-404 request on
every --nous upload (harmless, since errors were swallowed, but dead).
Remove it and simplify share_to_nous() to the 2-step mint + PUT flow that
matches the shipped contract. Drop the corresponding TestConfirmUpload class
and confirm assertions; add a test that the share succeeds even when the
response carries no id (we no longer depend on it).
The separately-flagged cross-repo requirement from #349's review --
sizeBytes is now REQUIRED and signed into the presigned URL's ContentLength
-- was already satisfied: share_to_nous() sends len(bundle) as sizeBytes and
urllib sets a matching Content-Length on the PUT. Verified against the live
merged preview (missing sizeBytes -> 400 invalid_body; present -> 503 dark).
Tested: pytest tests/hermes_cli/test_diagnostics_upload.py tests/hermes_cli/test_debug.py -> 95 passed.
`hermes debug share --nous` uploads the (force-redacted) debug bundle to
Nous-internal S3 storage via a presigned URL minted by the Nous account
service, instead of a public paste. The bundle is private — viewable only
by Nous staff / allowlisted mods through a Google-OAuth-gated viewer — and
auto-deletes after 14 days. The paste.rs path is unchanged and remains the
default.
- hermes_cli/diagnostics_upload.py (new): stdlib-urllib NAS client —
request_upload_url(), put_bundle(), confirm_upload() (best-effort),
share_to_nous() orchestrator. Base URL via HERMES_DIAGNOSTICS_BASE_URL
(default https://portal.nousresearch.com).
- hermes_cli/debug.py: extract collect_share_bundle() from build_debug_share()
so the Nous path reuses the exact same redaction/collection (paste.rs
behaviour unchanged); add build_nous_bundle() producing the gzipped
{"format":"hermes-debug-share/1","redacted":...,"files":...} envelope the
discord-support viewer parses; add the --nous run path with a privacy
notice and a clean fallback (suggest --local) on failure.
- hermes_cli/main.py: add the --nous flag + help/epilog entry on
`debug share`.
- tests: test_diagnostics_upload.py (new) mocks urllib; test_debug.py adds
bundle/Nous coverage. 97 passing.
Generic provider:custom relays were force-routed to the OpenAI Responses
API whenever the model matched gpt-5*, and a stale persisted
model.api_mode=codex_responses survived /reset and upgrades. Some
OpenAI-compatible relays do not implement Responses semantics, which
surfaced as malformed function_call.name replay errors in gateway sessions.
- runtime_provider: route custom-provider api_mode through
_resolve_plain_custom_api_mode(), which drops a stale codex_responses
unless the URL is direct OpenAI/xAI
- run_agent: _provider_model_requires_responses_api returns False for
custom; direct api.openai.com / api.x.ai URLs still upgrade via
_is_direct_openai_url() / URL detection
- regression coverage for plain relays vs direct OpenAI/xAI URLs
Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
Adds gateway.platform_connect_timeout (default 30s) to DEFAULT_CONFIG and
bridges it to the internal HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT env var
at gateway startup, following the existing gateway_timeout config->env
pattern. The env var remains the manual-override escape hatch and wins if
set explicitly; otherwise config.yaml supplies the value. This closes the
issue's documentation/config-surface request (#19776 suggestion 2) on top
of the adapter ready-wait fix, so users no longer need an undocumented env
var to raise the Discord connect timeout.
Refs #19776
A single 'hermes update' / 'hermes -p' could rewrite a hand-curated config.yaml
into a near-full DEFAULT_CONFIG dump (the 'you blow up my profile config on one
tweak' reports). Root cause: migrate_config() had ~16 independent save_config()
call sites, each author deciding ad hoc whether to materialise a value, and many
persisted pure schema defaults with strip_defaults=False. Defaults already merge
transparently at read time via load_config(), so writing them is pure bloat that
also shadows future default changes (see save_config's docstring).
Architectural fix (not a per-site patch): introduce a single _persist_migration()
chokepoint that enforces one invariant — a migration may persist only values that
DIFFER from the current schema default, plus explicit removals/renames of user
data; pure defaults are never written. Every migration write (all 17 sites incl.
the version-bump finalizer) now routes through it. The invariant is mechanically
correct for all cases and verified empirically:
- pure-default seeds (timezone='', curator/auxiliary.curator blocks, interim
flag, curator.consolidate=False, empty plugins.enabled) are stripped → merged
in at read time;
- non-default values (write_approval=True, model_catalog.ttl_hours=1) preserved
via explicit-raw-path preservation;
- behaviour flips (agent.verify_on_stop=False, schema default still 'auto')
preserved because False != 'auto';
- data transforms (custom_providers->providers, stt.model relocation,
write_mode->write_approval, compression.summary_* removal, MCP-disable)
persist their removals/renames.
An explicitly user-set non-default value (e.g. matrix.require_mention: false) is
preserved across the bump.
Guard tests lock the architecture: an AST check asserts migrate_config() makes no
direct save_config() call (all writes go through _persist_migration), and a
full-range v1->latest test asserts a lean config is never dumped. Two existing
change-detector tests that froze the on-disk representation of default-valued
keys are rewritten to assert the effective value via load_config() (behaviour
contract, not snapshot).
Validation: lean v1->latest migration drops from ~567 bytes to ~196 bytes;
148 config+setup and 196 profile/curator/migrate tests pass on scripts/run_tests.sh.
exact_moa_preset_name matched any bare model name equal to a preset key,
regardless of the preset's enabled flag. On the no-explicit-provider switch
path (PATH B in model_switch.py), a plain /model switch whose name collided
with a preset key (e.g. "default") silently pivoted the session onto the MoA
virtual provider — even when the user had set enabled: false to opt out
(issue #55187). The LLM driving a routine model switch could land on a broken
moa provider with empty default_preset / unconfigured aggregator credentials.
Gate the implicit bare-name match on the per-preset enabled flag. Explicit
selection via --provider moa / the model picker uses PATH A and does not go
through exact_moa_preset_name, so a disabled preset stays reachable when the
user explicitly asks for it.
Builds on memosr's sink-level opt-in gate (#29249). Enabling a
non-bundled plugin now surfaces the privileged allow_tool_override
decision at `hermes plugins enable` time instead of leaving the
operator to discover the config key after a runtime rejection.
- `hermes plugins enable <name>` prompts for non-bundled plugins:
'Allow this plugin to replace built-in tools?' Default is deny
(blank Enter / non-interactive stdin / EOF all fail closed).
- --allow-tool-override / --no-allow-tool-override flags for
non-interactive and scripted use (and a future desktop checkbox).
- Bundled plugins are trusted: never prompted, no entry written.
- Writes plugins.entries.<key>.allow_tool_override, the same key the
sink gate reads (manifest.key == discovery key), so consent and
enforcement compose end to end.
egilewski found the prior sink gate was transient: it only applied while
PluginManager executed register(ctx). A plugin could defer a direct
registry.register(..., override=True) to a post-load callback/thread, after
the scope was cleared, and still replace a built-in.
Make authorization durable by binding it to where the handler is DEFINED
(handler.__globals__['__name__']) rather than to call timing. At load, each
plugin's module namespace is mapped to its allow_tool_override opt-in in a
table that is never cleared. The sink resolves the handler's owning plugin
module and rejects an override from any plugin namespace without opt-in,
regardless of when or on which thread the call happens. Plugin namespaces
with no recorded policy are treated as not-opted-in (fail-closed). Built-in
and MCP handlers live outside the plugin namespace and are unaffected.
Adds a regression test for the delayed/post-load direct-registry override.
The opt-in gate lived only in PluginContext.register_tool, so a plugin
could bypass it by importing tools.registry and calling
registry.register(..., override=True) directly. Enforce the same gate at
the sink: during plugin load, the registry rejects an override from a
plugin without operator opt-in regardless of the path taken. Built-in and
MCP registrations (no active plugin scope) are unaffected.
Adds a regression test covering the direct-registry bypass.
The tool_override flag landed in v0.14.0 (#26759) so plugins can replace
a built-in tool with their own implementation. It works as advertised
but there is no trust gate, so any enabled third-party plugin can
silently override any built-in like shell_exec, write_file, or web_fetch
and exfiltrate everything the agent invokes through it. The only trace
is a DEBUG-level log line.
Compare with ctx.llm (#23194) which does gate the equivalent privilege
escalation: overriding the provider requires
plugins.entries.<id>.llm.allow_provider_override: true in config.yaml.
The policy shape exists, it just was not extended to tool overrides.
Fix:
* Add PluginToolOverrideError(PermissionError) for the gate failure.
* register_tool() now checks _tool_override_allowed(name) when
override=True. Bundled plugins (manifest.source == 'bundled') are
trusted by default. Every other source requires
plugins.entries.<plugin_id>.allow_tool_override: true in config.yaml.
* fail-closed: if config.yaml cannot be loaded for any reason,
_tool_override_allowed returns False. Same posture as
MSGraphWebhookAdapter.connect() in #22353.
Backwards compatibility:
* Bundled plugins: no change (source == 'bundled' short-circuits the
gate).
* Third-party plugins not using override: no change (gate is only
consulted when override=True).
* Third-party plugins using override: registration fails until the
operator opts in. The error message includes the exact config path
to add, so the fix is one config edit away for legitimate use cases.
Same migration path users went through for allow_provider_override
after #23194 landed.
Regression tests:
* tests/hermes_cli/test_plugins.py::test_register_tool_override_replaces_existing
and ::test_register_tool_override_on_new_name_is_noop_path were
written before the gate existed. Updated their test configs to
include allow_tool_override: true under
plugins.entries.<plugin_id>, mirroring how a legitimate operator
would now grant the privilege.
* New regression test ::test_register_tool_override_blocked_without_operator_opt_in
exercises both the PluginManager-catches-error path (built-in tool is
preserved, attacker plugin is skipped) and the direct-call path
(PluginToolOverrideError is raised with a message that names the
config key to set). Verified the test fails without this fix and
passes with it.
* All 73 tests in test_plugins.py continue to pass.
Folds @trevorgordon981's #50590 into difujia's #15139:
- exchange_copilot_token now prefers the authoritative endpoints.api from
the token-exchange response, falling back to the proxy-ep-derived host
- resolve_api_key_provider_credentials gains a copilot branch that resolves
the account-specific base URL and a non-empty last-resort guard, so chat
inference never wedges on an empty base URL (#50252)
Co-authored-by: Trevor Gordon <trevorbgordon@gmail.com>
Two changes that complete the Copilot auth story (#7731 parts 3 and 4):
1. Switch OAuth client ID from opencode (Ov23li8tweQw6odWQebz) to VS Code
(Iv1.b507a08c87ecfe98). The old ID produces gho_* tokens that return
404 on /copilot_internal/v2/token, making token exchange non-functional.
The new ID produces ghu_* tokens that support exchange.
2. Derive enterprise API base URL from the proxy-ep field in the exchanged
token. Enterprise accounts get tokens containing e.g.
"proxy-ep=proxy.enterprise.githubcopilot.com" which is converted to
"https://api.enterprise.githubcopilot.com" and stored in the credential
pool. Individual accounts (no proxy-ep) continue using the default URL.
The COPILOT_API_BASE_URL env var remains as a user escape hatch.
Tested on both Individual and Enterprise Copilot accounts:
- Individual: device flow works, exchange succeeds, base_url=None (default)
- Enterprise: device flow works, exchange succeeds, 39 models returned
including claude-opus-4.6-1m (936K), enterprise base URL derived
Parts 3 and 4 of #7731.
Three targeted fixes for Desktop GUI WebSocket stability when agent
turns starve the uvicorn event loop of CPU (GIL contention):
1. Loosen ws_ping_timeout for loopback binds (QW-1)
- Loopback (Desktop): ping 30s interval / 60s timeout
- Non-loopback (Cloudflare Tunnel): unchanged 20/20
- A GIL-heavy agent turn can stall the event loop past 20s;
uvicorn's keepalive ping runs on that same starved loop, so a
20s timeout kills an otherwise-healthy local connection over a
recoverable stall. 60s rides out the stall without affecting
half-open detection on public binds.
2. Coalesce streaming token frames in WSTransport (CF-2)
- Buffer high-frequency delta frames (message.delta, reasoning.delta,
thinking.delta) and flush as a batch every ~33ms (~30fps)
- Non-streaming frames (RPC responses, control/tool/completion events)
flush pending tokens first — wire ordering preserved
- Thread-safe via threading.Lock; worker threads return immediately
instead of blocking on per-token loop wakeups
- Reduces event-loop wakeup churn by orders of magnitude during model
streaming, directly cutting GIL pressure
3. Loop heartbeat watchdog (CF-1)
- Self-rearming call_later tick (2s) measures drift between expected
and actual fire time using loop.time() (monotonic)
- Logs 'event loop stalled Ns (GIL pressure suspected)' when drift >5s
- Turns mysterious WS drops into diagnosable log entries
- Uses call_later chain (not a task) — dies with the loop, nothing
to cancel on shutdown
Root cause: uvicorn's ws keepalive ping (20/20s) runs on the same
starved event loop as agent turns. Under GIL pressure from heavy agent
turns or delegation, the loop can't service the ping within 20s, so
the websockets protocol declares the connection dead. Reconnects fail
with ready_send_failed because the old process's loop is still wedged.
None of these fixes touch the model-facing message array, prompt
caching, message role alternation, or the wire protocol — they are
strictly display-transport improvements plus a config tweak and a
diagnostic log.
Tests: 762 passed, 17 skipped (0 failures) across test_tui_gateway_ws,
test_tui_gateway_server, test_web_server, and tui_gateway/ suites.
Batch delegation returned each subagent's full final_response verbatim
into the parent's context. A fan-out of N children could dump 60k+ tokens
at once, blowing the parent's context window and — on rate-limited
providers — triggering a compression/429 death spiral (429 misread as
context-too-large -> window step-down -> retry loop -> conversation dies).
Cap each summary against the parent's *remaining* context headroom split
across the batch (not a magic char count). When trimming, mirror the
web_extract convention: spill the full text to cache/delegation (mounted
into remote backends via credential_files._CACHE_DIRS) and return a
head+tail window (75/25, line-snapped) plus a footer with the exact
read_file offset to page the omitted middle. Both the subagent's opening
AND its closing (outcomes / files-changed / issues, which live at the end)
survive in-context, and nothing is lost — the parent can read_file the
full version on any backend.
delegation.max_summary_chars (default 24000) is a static ceiling layered
on top as belt-and-suspenders for models that ignore 'be concise'; 0
disables it. Child prompt tightened to lead with outcomes / bullets.
Co-authored-by: rc-int <rcint@klaith.com>
`_resolve_api_key_provider_secret` resolved API keys via `get_env_value`,
which returns the `os.environ` value first and only falls back to
`~/.hermes/.env`. After a user rotates a key in `.env`, a stale value still
exported in the parent shell (Codex CLI, test runner, login profile) shadows
the fresh key on every request, producing persistent 401s.
The credential-pool seeding path was already fixed to prefer `.env`
(#18254/#18755), but the live request-time resolution path was not — so the
pool re-seeded with the fresh key while `_resolve_api_key_provider_secret`
kept returning the stale shell export. This closes that remaining path.
- config: add `get_env_value_prefer_dotenv()` — checks `~/.hermes/.env`
first, then `os.environ`. Distinct from `get_env_value()` (unchanged,
os.environ-first) so only Hermes-managed credential resolution flips
precedence; the generic helper's many callers are unaffected.
- auth: `_resolve_api_key_provider_secret` resolves through the new helper.
- tests: regression coverage for both the pool-seeding path and the
auth resolution path (a rotated `.env` key must beat a stale shell export).
Closes#20591.
Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Terminal rendition of the desktop Star Map / Memory Graph: learned skills
and memories on a timeline, shared by `hermes journey` and the TUI
`/journey` overlay via one size-aware Python renderer
(agent/learning_graph_render.py).
- TUI overlay mirrors /agents: static chart overview + selectable slice
list → slice detail → single skill/memory body, with the shared
inverse-row selection treatment and a pinned footer.
- Reuse primitives: extract OverlayScrollbar into its own module (now
shared with agentsOverlay), scroll the item body via ScrollBox, and
unify both lists through one table-driven ListRow.
- No animation/playback in the TUI — pure data; the renderer's reveal
scrubber stays available in the CLI (`--play`, `--reveal`).
MCP tools connected and enabled but never surfaced into the agent's
session toolset on the desktop app + dashboard WebUI (#51587).
There are two independent background MCP discovery thread owners by
surface: tui_gateway.entry (stdio 'hermes --tui') and hermes_cli.mcp_startup
(desktop app + dashboard WS sidecar via tui_gateway/ws.py, and 'hermes
dashboard'). The late-refresh scheduler gates on
tui_gateway.entry.mcp_discovery_in_flight(), which read ONLY the entry
thread global. On the desktop/dashboard surfaces that global is None, so a
server slower than the bounded build-time wait never triggered a late
refresh and its tools stayed invisible for the whole session.
Make mcp_discovery_in_flight() / join_mcp_discovery() consult BOTH thread
owners. Adds the matching in-flight/join helpers to hermes_cli.mcp_startup
and has tui_gateway.entry delegate to them as a second owner.