Commit graph

3247 commits

Author SHA1 Message Date
Teknium
19d4174454
feat(gateway): add /sessions search <query> (#57685)
Gateway users can now search resumable sessions from messaging surfaces:
/sessions search <query> (alias: find) matches titles and session ids —
including every title/id in a row's forward compression chain, so a
compressed-away title still surfaces its live tip — plus a
punctuation-normalized variant so 'an94' matches 'AN-94'.

Implemented by generalizing the existing id_query chain-filter in
SessionDB.list_sessions_rich into a combined SQL-level filter (search
stays ORDER BY last-active + LIMIT at SQL level), threading a
search_query through the shared query_session_listing helper, and
teaching parse_session_listing_args to split off a search query.

Search results pass through the existing _resume_row_visible guard
unchanged: origin scoping, admin-only 'all', and the fail-closed
legacy-row posture from the July 1 hardening are preserved exactly.
Over-fetch (50) before the visibility cut so origin-invisible matches
can't starve the page.

Salvages the feature direction of PR #57595 by @GodsBoy with a minimal
implementation that keeps the resume authorization surface untouched.
2026-07-03 13:44:00 -07:00
Brooklyn Nicholson
914d19b3a9 fix(desktop,gateway,mcp): post-merge — CI contract, review corrections, hub search
Post-merge follow-ups + several review rounds + a hub-search rework, folded together.

Merge-scuff restores (a stale-base refactor had reverted two live-on-main fixes):
- gateway: SessionStore compression-tip healing + its regression test.
- desktop: messaging session/transcript polling in desktop-controller
  (MESSAGING_POLL / ACTIVE_MESSAGING_SESSION_POLL, refreshMessagingSessions,
  refreshActiveMessagingTranscript, the richer sameCronSignature) so inbound
  platform traffic updates live again instead of freezing until manual refresh.

Profile-switch isolation (epoch/close/guard on every profile-scoped async):
- Hub store clears + in-flight runHubAction bails (and swallows the post-switch
  404 instead of a phantom toast); hub preview/scan/search/sources profile-scoped.
- MCP: probe/auth epoch guards, dirty-draft reset, sidebar mutations blocked
  until config resettles AND every persist re-checks the epoch post-await;
  profilePending clears on config settle incl. error; logs re-key on profile.
- Model settings reload on switch and epoch-guard setModelAssignment /
  saveMoaModels / API-key activation.
- Config draft resets + cancels its autosave on switch; skill editor/archive and
  star-map node dialogs close on switch; openSkillEditor / star-map openEdit
  discard stale fetches; tool-usage analytics loads are profile-guarded/keyed.

Correctness + UX:
- Unique per-skill action names for hub install AND uninstall; hub/​catalog rows
  flip only on a clean exit_code; catalog install polls the background bootstrap
  to completion, reconciles the mcp.json draft (no dropped server), and fails
  loudly on non-zero exit; MCP catalog query keyed by profile.
- /test reports needs-auth for anonymous auth:oauth servers; /auth snapshots +
  restores tokens on a failed re-auth and clears the full 300s callback window.
- config-settings shows a retry on load failure; CodeEditor/JsonDocumentEditor
  go read-only while saving so edits typed mid-save aren't dropped.
- Deep-link highlighter deletes its param only after a successful scroll.
- Restored the PageSearchShell trailing slot → Artifacts refresh button/spinner.
- /settings?tab=mcp redirect keeps server=.

Progressive hub search: fan out one query per backend-searchable source
(index-covered API sources stay unsearchable → no ~70-call GitHub re-hammer),
merge/dedupe by trust as each lands, per-source spinner overlaid on the dimmed
chip — results stream in without blocking on the slowest, no layout shift.

test(web): /api/skills list carries usage + provenance (CI contract).
2026-07-03 15:22:43 -05:00
Brooklyn Nicholson
65415b1a12 Merge remote-tracking branch 'origin/main' into bb/skills-renovate 2026-07-03 13:59:26 -05:00
kshitijk4poor
a6b9597d5f perf(console): cache CLI-surface summaries + bound console worker pool
Addresses two non-blocking review notes on the Hermes Console PR:

- console_engine: the four _*_summaries helpers import a subcommand module
  and build a throwaway argparse tree purely to extract help summaries. The
  dashboard opens a fresh HermesConsoleEngine per /api/console connection, so
  every reconnect re-imported + re-parsed the whole CLI surface. The surface
  is process-static, so memoize with functools.lru_cache — callers only read
  the returned map.

- web_server: console commands run in a worker thread via asyncio.to_thread.
  On a 60s timeout asyncio.wait_for cancels the awaitable, but Python threads
  aren't preemptible, so a stuck worker keeps running and would leak into the
  shared default thread pool. Route console execution through a small
  dedicated bounded ThreadPoolExecutor (max_workers=4) so a leaked worker is
  capped and concurrent console execution is bounded regardless of reconnects.

Follow-up on top of @shannonsands' NS-574 Hermes Console.
2026-07-03 20:18:00 +05:30
Shannon Sands
1e7111d25d Use shared ANSI stripping in Hermes Console 2026-07-03 20:18:00 +05:30
Shannon Sands
f7d90edd8b Add dashboard Hermes console UI 2026-07-03 20:18:00 +05:30
Shannon Sands
4493bba901 Add dashboard Hermes console websocket 2026-07-03 20:18:00 +05:30
Shannon Sands
dcbce869ae Add safe Hermes console REPL 2026-07-03 20:18:00 +05:30
Teknium
87ae4ae94b
fix(update): harden #57659 follow-ups — task restore on failure, --force-venv split, trampoline detection, managed-install health (#57680)
Five follow-ups to #57659 from post-merge review:

1. install.ps1: gateway scheduled-task re-enable now runs in a finally
   (a thrown Remove-Item/uv venv failure previously stranded the user's
   gateway autostart disabled), and tasks that were already disabled
   before the install are no longer blindly re-enabled.
2. The venv-python holder guard is no longer bypassed by plain --force
   (which the desktop bootstrap passes on every update while its lock
   probe only checks hermes.exe/app.asar). New explicit --force-venv is
   the escape hatch; --force keeps bypassing only the hermes.exe shim
   guard.
3. _detect_venv_python_processes now also catches uv/base-interpreter
   trampolines whose exe is outside the venv, via cmdline (venv path or
   '-m hermes_cli.main' tied to this install root) and cwd.
4. Missing venv python is now UNHEALTHY on managed installs
   (.hermes-bootstrap-complete / .update-incomplete markers) so the
   repair lane runs instead of 'Already up to date!'; the repair branch
   recreates the venv first when it's gone entirely. Dev checkouts keep
   reporting healthy.
5. install.ps1 comment no longer claims a Startup-folder disarm the
   code doesn't perform (logon-only, not a mid-install respawner).
2026-07-03 04:08:37 -07:00
LeonSGP43
f6a3d2e900 fix(model): preserve named custom provider slug 2026-07-03 03:33:06 -07:00
teknium1
7485fe0605 fix(dashboard): make .env sensitive-file guard case-insensitive
Follow-up to #57507: .ENV / .Env.local on case-insensitive filesystem
mounts slipped past the guard. Lowercase the name before matching and
add a regression test. Addresses egilewski's open review note.
2026-07-03 03:27:47 -07:00
liuhao1024
1bcc52c14e fix(dashboard): use pattern match for .env sensitive file guard
Replace the exact-filename frozenset with _is_sensitive_filename()
that matches .env plus any .env.<suffix> variant.  This covers
shorthand suffixes like .env.prod that the previous enumeration
missed.

Add test_sensitive_env_suffix_variants_blocked regression test
covering .env.prod, .env.dev, .env.staging.local, and .env.ci.

Addresses review feedback from egilewski on PR #57507.
2026-07-03 03:27:47 -07:00
liuhao1024
bc55c201c7 fix(dashboard): block .env files from managed-files API
The dashboard Files tab could list, read, and download .env files
containing API keys when running with a bind-mounted Hermes home
directory (e.g. docker run -v ~/.hermes:/opt/data).

Add _SENSITIVE_FILENAMES frozenset and filter these from
list_managed_files(), read_managed_file(), and download_managed_file().
Return 403 for direct read/download attempts on sensitive files.

Fixes #57505
2026-07-03 03:27:47 -07:00
Teknium
b14d75f8af
fix(update): prevent and self-heal half-updated venvs on Windows (#57659)
Root-causes the July 2026 Windows incident chain (locked _brotlicffi.pyd /
_sodium.pyd during install, then 'No module named annotated_doc' with
'hermes update' insisting 'Already up to date!'):

- hermes update: probe venv core imports even when the checkout is current;
  a half-updated venv (dep sync killed mid-flight by a locked .pyd) is now
  detected and repaired instead of being reported as up to date
- hermes update (Windows): after pausing gateways, refuse to mutate the venv
  while other processes run from the venv interpreter (the Desktop backend
  runs as python.exe so the hermes.exe shim guard never saw it); --force
  keeps the old behavior
- install.ps1 venv stage: disarm gateway autostart Scheduled Tasks before
  the kill sweep (they respawn the gateway inside the kill->delete window),
  make the sweep a bounded loop requiring 3 clean passes, and rename-then-
  delete the old venv (a rename succeeds even with mapped DLLs) with stale-
  dir cleanup on the next run
- desktop updater: 'venv shim still locked after 15s' now ABORTS the update
  hand-off (restarting our backend, surfacing the holder to the user)
  instead of 'proceeding anyway (force)' into guaranteed venv corruption;
  the unlock wait also re-kills respawned backends each poll tick
2026-07-03 03:24:08 -07:00
Brooklyn Nicholson
16aa09aca5 feat(mcp): first-class MCP tab — catalog, GUI auth/probe/logs, per-tool gating
A Cursor-style MCP manager inside Capabilities, plus the backend it needs.

- Server list with brand/favicon avatars + live status dot and a capability
  summary (N tools, M prompts, K resources); Servers | Catalog views.
- Catalog: one-click install of Nous-approved servers with required-env prompts.
- GUI OAuth: Authenticate opens the system browser from the TTY-less backend and
  verifies a token actually lands; header/API-key servers are never pushed down
  OAuth; a dirty mcp.json can't drop a freshly-persisted auth field.
- Full-width mcp.json editor (ecosystem document format) + pinned stdio/agent
  LogTail; probes cached 5m and keyed by (profile, config) so revisiting never
  respawns the fleet or shows a stale probe.
- Whole-map persistence (PUT /api/mcp/servers) so deletes/toggles actually stick
  (the generic /api/config deep-merge could not remove keys).
- perf: MCP probe/auth no longer hold the global skills lock, so a slow stdio
  spawn can't stall every other request into a 15s timeout.
- per-tool include/exclude gating (lib/mcp-tool-filter) mirroring the CLI loader.
2026-07-03 05:08:28 -05:00
teknium1
eb99f82ce4 fix(browser): surface launch diagnostics when debug browser never opens the CDP port
Follow-up to the salvaged early-exit retry fix (#35617): the debug-browser
launch path was fire-and-forget (stderr to DEVNULL, no logging), so every
platform failure — Windows singleton forward to an existing instance, bad
profile dir, missing shared libraries, policy blocks — collapsed into the
same unactionable 'port 9222 isn't responding yet' message and debug
reports contained nothing.

- launch_chrome_debug() returns a structured ChromeDebugLaunch with
  per-candidate attempts (state, exit code, stderr tail)
- browser stderr is captured to <hermes_home>/chrome-debug/launch-stderr.log
- clean exit (code 0) without the port opening is detected as Chromium's
  single-instance forward and produces a targeted user hint to close all
  running instances of that browser
- crash exits surface the stderr tail (e.g. missing libnspr4.so)
- every spawn/exit is logged to agent.log so hermes debug share captures it
- CLI (/browser connect) and TUI/desktop (browser.manage) both print the hint
2026-07-03 01:05:22 -07:00
LeonSGP43
c74f093523 fix(browser): retry next candidate when debug launch exits early 2026-07-03 01:05:22 -07:00
Teknium
c7103c637c
feat(desktop): CLI/dashboard parity — skills hub, MCP test/toggle/catalog, maintenance ops, log filters (#57441)
* feat(desktop): CLI/dashboard parity — skills hub browser, MCP test/toggle/catalog, maintenance ops, log filters

Brings desktop GUI to parity with hermes skills/mcp/doctor/backup/debug-share/
curator/memory CLI commands and the dashboard's System + Skills-hub pages:

- Skills page: new Browse Hub tab (search official/GitHub/community sources,
  preview SKILL.md, security scan verdicts, install/update with live action log)
- MCP settings: connection test (tool listing), per-server enable/disable
  toggle, and a Catalog tab installing Nous-approved MCP servers with env prompts
- Command Center: new Maintenance section (doctor, security audit, backup,
  debug share links, curator status/pause/run, memory file status + reset)
- Command Center system logs: file (agent/errors/gateway/desktop), level, and
  substring filters instead of a fixed agent.log tail
- hermes.ts API client + types for all the above; en/zh locale strings (ja and
  zh-hant inherit via defineLocale)

* feat(desktop): backend model catalogs in toolset config — hermes tools parity

Completes the `hermes tools` parity gap: after picking an image/video
generation backend the CLI runs a model picker (e.g. FAL's multi-model
catalog with speed/strengths/price); the desktop toolset drawer now has the
same flow as a radio-card list.

- web_server: GET /api/tools/toolsets/{name}/models (catalog + current +
  default for the active or named provider row) and PUT .../model
  (validated write to image_gen.model / video_gen.model), reusing the CLI's
  plugin catalog helpers so GUI and `hermes tools` stay in lockstep
- desktop: ModelCatalogPicker in ToolsetConfigPanel — per-model cards with
  speed/strengths/price, in-use + default badges, disabled until the
  backend is the active one; provider selection now mirrors is_active
  locally so the catalog unlocks without a refetch
- tests: 3 backend endpoint tests (catalog shape invariants, persist +
  validation), 2 component tests, 2 API-contract tests; en/zh strings
2026-07-03 01:02:47 -07:00
Teknium
9e044cf795
feat(moa): per-preset fanout cadence — user_turn runs advisors once per user turn (#57591)
New preset key 'fanout': 'per_iteration' (default, unchanged behavior)
re-runs the reference fan-out whenever the advisory view changes — every
tool iteration. 'user_turn' runs the advisors ONCE per user turn and lets
the aggregator act alone for the rest of the tool loop — the original MoA
shape (upfront multi-model synthesis, then a single acting model), and the
obvious lever on MoA's wall/cost multiplier (advisor generation dominates
per-turn latency).

Implementation reuses the existing turn-scoped reference cache: in
user_turn mode the cache signature hashes only the prefix up to the LAST
user message, so mid-turn advisory-view growth doesn't change the key and
iteration 2+ is a cache HIT (advice reused, zero advisor spend, no
re-trace). A new user message changes the prefix and re-triggers the
fan-out. Unknown fanout values normalize to per_iteration.
2026-07-03 01:02:44 -07:00
Teknium
6eb39c2bbe
fix(opencode-go): heal stripped /v1 base_url so non-minimax models stop 404ing (#57585)
OpenCode Go serves minimax/qwen via Anthropic Messages (base URL without
/v1 — the SDK appends /v1/messages) and glm/kimi/deepseek/mimo via OpenAI
chat completions (base URL WITH /v1). The runtime stripped /v1 for
anthropic-routed models, and the TUI/desktop + gateway persisted that
stripped URL to model.base_url. Every later chat_completions model then
POSTed to https://opencode.ai/zen/go/chat/completions — a 404 (the
marketing site). Result: only minimax worked; glm/deepseek/kimi all 404ed.

- New normalize_opencode_base_url(): symmetric /v1 normalization —
  strip for anthropic_messages, re-append for chat_completions /
  codex_responses on opencode.ai hosts (heals persisted stripped URLs;
  custom proxy overrides untouched)
- Applied at all three former one-way strip sites (resolve_runtime_provider
  x2, switch_model)
- opencode_model_api_mode: all Qwen models on Go AND Zen now route via
  /v1/messages per current published endpoint tables (previously only
  qwen3.7-max on Go — qwen3.6-plus etc. would 404 the same way)
- Catalog refresh: Go gains deepseek-v4-pro/flash, glm-5.2,
  kimi-k2.7-code, minimax-m3, qwen3.7-plus; Zen gains glm-5.2,
  kimi-k2.7-code, minimax-m3, qwen3.7-plus

Reported by IndieSuperhuman on X: opencode-go 404s for any model other
than minimax.
2026-07-03 00:46:45 -07:00
Teknium
372f8195c7
fix(moa): default temperatures to unset — provider default, like single-model agents (#57440)
A single-model Hermes agent never sends temperature; the provider default
applies. MoA hardcoded reference_temperature=0.6 / aggregator_temperature=0.4,
and the coercion float(preset.get(key, 0.6) or 0.6) made unset IMPOSSIBLE to
express: absent, null, empty, and even an explicit 0 all collapsed to the
baked-in default. Every MoA advisor and aggregator therefore ran at 0.6/0.4
while the same model running solo used the provider default — silently
skewing solo-vs-MoA comparisons and overriding provider-tuned defaults.

- moa_config normalization: temperatures coerce to None when absent/blank/
  invalid (new _coerce_float_or_none); explicit values incl. 0 honored.
- moa_loop: _preset_temperature() resolves preset values; None flows to
  call_llm, which already omits the parameter when None (same contract as
  max_tokens). Aggregator still inherits the acting agent's own configured
  temperature when the preset doesn't pin one.
- conversation_loop (context-mode MoA): same resolution, no more hardcoded
  0.6/0.4 at the call site.
- DEFAULT_CONFIG preset + web_server payload models + docs updated: unset
  is the default, pinning stays available.
2026-07-03 00:22:49 -07:00
Brooklyn Nicholson
89acc19606 fix(dump): flag API keys visible only to the shell, not the managed backend
hermes debug share reads os.getenv — the invoking terminal's environment — but
launchd/systemd and the desktop-spawned `serve` backend load credentials from
~/.hermes/.env, not the login shell. A key exported in the shell but absent
from .env is invisible to the backend, yet the dump printed a bare "set",
sending support down a phantom "the key is configured" path.

This was the actual trap behind a "Desktop has no web_search / no tools"
report: FIRECRAWL_API_KEY was a shell export (so `debug share` in a terminal
read "firecrawl set") but not in .env, so the launchd backend's
check_web_api_key returned False and web_search was gated off — which a
contributor then misdiagnosed as a missing `desktop` platform registration.

The dump now annotates any key set in-process but missing from ~/.hermes/.env
with "(shell only — not in .env; managed/desktop backend may not see it)" so
the mismatch is obvious instead of hidden behind "set".
2026-07-02 19:52:18 -05:00
kshitijk4poor
ed4123792c refactor(providers): dedupe extra_headers normalizer + key picker groups by headers
Follow-up to @helix4u's #57336 salvage. Two review findings:

- W1: model-picker grouped custom-provider rows by
  (api_url, credential, api_mode) but NOT extra_headers. Entries sharing a
  URL+credential+api_mode yet declaring different headers (e.g. per-tenant
  routing behind one proxy) collapsed into one row and probed /models with
  whichever header set was seen first (order-dependent). Fold a canonical
  header identity into group_key so distinct header-authed endpoints stay
  separate; drops the now-dead first-non-empty merge branch.
- W2: the extra_headers stringify+None-filter comprehension existed in 5
  copies (config.py x2, runtime_provider.py, model_switch.py, models.py).
  Extract one shared hermes_cli.config.normalize_extra_headers primitive;
  all sites now call it.

Tests: +normalize_extra_headers unit tests, +regression test proving two
same-endpoint entries with different headers stay distinct and each probes
with its own headers. 223 targeted tests pass; ruff clean.
2026-07-03 04:23:15 +05:30
helix4u
ab40e952f3 fix(providers): pass extra headers to model discovery 2026-07-03 04:23:15 +05:30
kshitijk4poor
e73adb5043 fix(dashboard): disable ws keepalive ping on loopback to survive event-loop stalls
Desktop/dashboard WebSocket connections drop during long agent operations
(delegate_task subagents, large model outputs) when the uvicorn event loop is
GIL-starved for minutes. Root cause: uvicorn's ws keepalive ping runs on the
SAME event loop as agent turns. A single synchronous GIL-holding call on a
worker thread (a regex/scrub over a large output, or a long subagent turn)
freezes the loop, so it cannot process the incoming pong within ws_ping_timeout
and uvicorn closes an otherwise-healthy connection (#53773: 'event loop stalled
226.3s'; #48445/#50005). Loosening the timeout only raises the threshold — a
multi-minute stall sails past any finite window.

The keepalive ping exists to detect half-open connections (reverse-proxy 524,
dropped tunnels), which cannot happen on loopback: there is no network or proxy
in the path, and a dead local client tears the socket down with a real FIN/RST
that starlette surfaces as WebSocketDisconnect regardless of the ping. So on
loopback the ping provides ~no liveness value while actively killing
recoverable stalls — disable it entirely (ws_ping_interval/timeout=None).

Non-loopback (public) binds sit behind a Cloudflare Tunnel where half-open IS a
real failure mode, so the ping stays at 20/20 to detect it.

Empirically verified (real uvicorn + websockets peer): with ws_ping=None the
server never closes a silent peer during an 8s window; with the pre-fix 2s/2s
window uvicorn closes it. A genuinely-dead client still fires the
WebSocketDisconnect reap path regardless of the ping.

Note: this fixes the local Desktop case (the OP's scenario). A remote Desktop
over an authenticated public dashboard route (McCalebTheSecond's comment) keeps
the ping and needs the deeper GIL-hotspot fix — tracked separately.

Closes #53773
2026-07-03 03:33:22 +05:30
kchantharuan
048270fa06 fix: refresh NVIDIA featured models 2026-07-03 03:00:30 +05:30
Brooklyn Nicholson
1501a338c3 fix(cli): stop profile-bound backends before deleting so rmtree converges
delete_profile stopped only the process named in gateway.pid, but a Desktop
app spawns a headless `serve`/`dashboard` backend per profile that holds the
profile's SQLite connection open and keeps writing sessions/WAL/sandbox files.
That backend is never in gateway.pid, so a CLI `hermes profile delete` run
while the Desktop app is up left it writing into the tree — rmtree's final
rmdir then failed with ENOTEMPTY (#47368 "Bug 2"), and pre-guard it also
resurrected the directory.

- _profile_bound_backend_pids(): find running Hermes backends bound to this
  profile via a `--profile <name>` selector or a HERMES_HOME env resolving to
  the profile dir. Tightly scoped — current-user only, backend subcommands
  (serve/dashboard/gateway) only so an interactive chat is never killed, and
  never this process or its ancestors.
- _stop_profile_backends(): terminate them (graceful, then force), best-effort
  so it can never make delete worse.
- _rmtree_with_retry(): a few spaced retries absorb the ENOTEMPTY / Windows
  file-lock race from a just-terminated writer's in-flight -wal/-shm/sandbox
  writes instead of failing the whole delete on a race the next attempt wins.

Complements the recreation guard (deleted profiles no longer reappear) and the
Desktop teardown-before-delete flow; this is the CLI-side convergence fix for a
delete run while a Desktop-managed backend is live.

Part of #47368.
2026-07-02 15:31:35 -05:00
Jaaneek
5ef0b8acb0 feat(auth): make xAI Grok OAuth device-code-only, drop loopback login
Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.

- Poll the token endpoint with server-provided interval, honoring
  slow_down and expires_in; store tokens with auth_mode
  oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
  rotated tokens sync back to auth.json, the global root store, and the
  credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
  drop the duplicate dashboard pool entry so exactly one seeded
  device_code entry exists.
- Use the shared device_code source name for consistency with the
  nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
  pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
  --manual-paste flag from documented commands.
2026-07-02 13:17:41 -07:00
LeonSGP43
472d75193f Prevent deleted profile skeleton revival 2026-07-02 15:11:56 -05:00
Jneeee
b98baa3039 feat(config): extra HTTP headers for LLM API calls (#3526 salvage)
Named providers / custom_providers entries in config.yaml now accept an
extra_headers dict scoped to that endpoint — for reverse proxies, API
gateways, and custom auth schemes (e.g. Cloudflare Access service tokens).

- hermes_cli/config.py: normalize extra_headers on provider entries
  (_normalize_custom_provider_entry + providers-dict translation), add
  get_custom_provider_extra_headers /
  apply_custom_provider_extra_headers_to_client_kwargs helpers keyed on
  base_url (case/trailing-slash insensitive, no substring bypass —
  mirrors the TLS helpers)
- hermes_cli/runtime_provider.py: surface extra_headers in the resolved
  runtime for named custom providers (providers dict, legacy
  custom_providers list, and the credential-pool path)
- run_agent.py / agent/agent_init.py: merge per-provider extra_headers
  onto the OpenAI client default_headers at construction and on every
  _apply_client_headers_for_base_url re-application (credential swaps,
  rebuilds), most-specific level wins; OpenAI-wire only (native
  Anthropic/Bedrock scoped out)
- agent/auxiliary_client.py: accept model.extra_headers as an alias of
  model.default_headers for the global variant
- cli-config.yaml.example: documented commented example
- Header values are treated as secrets and never logged

Salvaged from PR #3526 by @jneeee, reimplemented against current main.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-07-02 05:33:25 -07:00
Mibayy
ce9aa869fc feat(commands): /compact alias + --preview/--dry-run flags for /compress (#3243 salvage)
Salvaged from PR #3243 by @Mibayy, reimplemented against current main
(the original diff targeted a removed gateway/run.py handler).

- /compact is now a first-class alias of /compress (CLI, gateway,
  Telegram/Slack/Discord command lists, autocomplete) — also fixes the
  dangling '/compact' references in gateway error messages
  (gateway/run.py context-exhausted banners).
- --preview / --dry-run: report what WOULD be compressed (message
  counts, token estimate, 'here [N]' boundary) without touching the
  transcript. Flags coexist with the existing 'here [N]' / focus-topic
  args on both the CLI and gateway surfaces via shared pure helpers in
  hermes_cli/partial_compress.py.
- --aggressive (LLM-free hard truncation) is intentionally NOT
  implemented: it would need its own transcript-persistence branch
  outside the guarded _compress_context rotation machinery (#44794
  data-loss class). The flag is recognized and returns an explanatory
  message pointing at '/compress here [N]' and /undo instead of being
  mis-parsed as a focus topic.
- locales: gateway.compress.aggressive_unsupported added to all 16
  catalogs (parity test enforced).
- release.py: AUTHOR_MAP entry for contributor credit.
2026-07-02 05:10:31 -07:00
Morgan K
39bff67957 feat(gateway): add 'log' option to display.tool_progress
Salvage of #3459 by @keslerm, reimplemented against the restructured
progress-callback block in gateway/run.py (resolve_display_setting,
needs_progress_queue, thinking-relay). Duplicate PR #3458 by @dlkakbs was
submitted 4 minutes earlier with the same feature — both credited.

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>

tool_progress: log keeps the chat silent and appends timestamped tool-call
lines to ~/.hermes/logs/tool_calls.log via a dedicated queue drained by an
async writer (RotatingFileHandler 5MB x 3, RedactingFormatter so secrets
never land on disk). Gateway-only by design; thinking_progress relaying and
the webhook gate are unaffected. /verbose now cycles
off -> new -> all -> verbose -> log.
2026-07-02 05:09:38 -07:00
Mibayy
070ac2a719 fix(status): label provider as custom when config.yaml model.base_url is set
Salvage of the surviving hunk of #3296 by @Mibayy. The PR's gateway
_handle_provider_command hunk targets code removed on main (/provider was
absorbed into /model + /status, which already read model.base_url); the
hermes status mislabel was the remaining live symptom:
_effective_provider_label() only checked the legacy OPENAI_BASE_URL env var,
so a custom endpoint configured canonically in config.yaml still displayed
as OpenRouter.
2026-07-02 04:59:02 -07:00
Teknium
6e369a3762
feat(delegation): unify concurrency caps — deprecate max_async_children (#56955)
delegation.max_concurrent_children is now the single cap for both a
batch's parallelism and concurrent background delegation units.

- _get_max_async_children() delegates to _get_max_concurrent_children();
  a leftover max_async_children key logs a one-time deprecation warning
- config v32→33 migration removes the stale key, folding a raised
  max_async_children into max_concurrent_children (max wins, no lost
  headroom)
- capacity error messages now point at max_concurrent_children
- pool-at-capacity sync fallback now attaches an explanatory note so
  the model/user know why the call blocked instead of dispatching async

Previously users who raised max_concurrent_children (e.g. to 15) still
hit the invisible default-3 async cap: the 4th background delegate_task
silently ran inline, blocking the turn with no signal.
2026-07-02 02:53:39 -07:00
Teknium
fb403a3a73
fix(auxiliary): retry transient blips harder + isolate client cache per model (#56889)
Two related hardening fixes for auxiliary calls (which include MoA reference
advisors — a pinned-model path where provider fallback is not a meaningful
recovery):

1. Transient-transport retries: the same-provider retry on a connection reset /
   timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux
   call a second blip silently loses the call (root of the run2 double-advisor
   'Connection error' collapse — a genuine upstream blip). Now retries N times
   with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3
   total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out
   preserved.

2. Per-model client-cache isolation: _client_cache_key excluded the model, so
   two concurrent auxiliary calls to the same provider/base_url/key but
   different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry
   and could race each other's client lifecycle. Model now participates in the
   key -> distinct clients, no cross-call races. Same-model reuse unchanged.

- agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in
  _client_cache_key and both call sites.
- hermes_cli/config.py: auxiliary.transient_retries default (2).
- tests: new retry/isolation tests; updated 2 stale-expectation tests to the
  corrected behavior (per-model resolve; N-retry escalation).

Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.
2026-07-02 01:09:37 -07:00
Nick Mason
80733413f9 fix(tools): don't drop a toolset from platform inference when a tool is registered into it
_get_platform_tools reverse-maps a platform composite to configurable
toolsets with an all-tools subset test. Because get_toolset() merges
registry-registered tools into a toolset, a tool added to a toolset
(delegate_cli -> delegation; desktop-only read_terminal -> terminal) that the
static composite never listed made the subset test fail, silently dropping the
entire toolset on api_server and other inference-based platforms. Compare the
toolset's static membership at all three reverse-map sites.

Fixes #49622.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 13:25:25 +05:30
Teknium
543d305bbb
feat(moa): add reference_max_tokens to cap advisor output and cut turn latency (#56756)
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.

Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).

E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.

- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
  in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
2026-07-02 00:16:35 -07:00
Ben Barclay
9be39de0f2
fix(auth): make HERMES_PORTAL_BASE_URL/NOUS_PORTAL_BASE_URL bypass the Portal host allowlist (#56864)
Ben caught that the initial approach (widening _NOUS_PORTAL_ALLOWED_HOSTS to
include the staging host) was the wrong fix -- env vars are supposed to
override the allowlist, mirroring how NOUS_INFERENCE_BASE_URL already
bypasses _ALLOWED_NOUS_INFERENCE_HOSTS via _nous_inference_env_override().

The actual bug: both resolve_nous_access_token and
resolve_nous_runtime_credentials read
`_optional_base_url(state.get("portal_base_url")) or os.getenv(...) or ...`
-- a plain `or` chain where the STORED state value wins first (short-circuits
before the env vars are even read), and then whichever value won gets run
through the same _NOUS_PORTAL_ALLOWED_HOSTS gate regardless of its source.
So a hosted agent stamped with HERMES_PORTAL_BASE_URL=<staging> in its env
AND a staging portal_base_url already persisted to auth.json would still
get silently rewritten to prod on every refresh, because the env var never
even got a chance to be consulted.

Revert the previous _NOUS_PORTAL_ALLOWED_HOSTS widening entirely --
staying prod-only preserves the allowlist's actual job (rejecting an
untrusted network-provided portal_base_url persisted to auth.json by a
compromised Portal response).

Add _nous_portal_env_override() (mirrors _nous_inference_env_override())
and restructure both call sites so the env override is checked FIRST and,
when set, wins outright and skips the allowlist gate entirely -- the
allowlist only ever runs against the fallback (stored-state-or-default)
path now.

Rewrote tests/hermes_cli/test_nous_portal_staging_allowlist.py to test the
actual fix: the helper function, and an end-to-end
resolve_nous_access_token proof that the env override wins even when state
ALSO has the staging host stored (the exact incident shape), that it wins
over a stored PROD host too, and that the allowlist's heal-to-prod
behaviour for an untrusted stored value is preserved when no override is
set.
2026-07-02 06:52:46 +00:00
kshitij
2f7c51a3e2
Merge pull request #56605 from simpolism/codex/discord-inline-bot-mentions
fix(discord): ignore reply-ping-only mentions for bot-authored messages
2026-07-02 05:23:44 +05:30
kshitijk4poor
676236bb1d fix(agent): honor custom CA certs on aux client + harden TLS resolution
The salvaged fix wired per-provider ssl_ca_cert / ssl_verify (and
HERMES_CA_BUNDLE) into the MAIN OpenAI client. This follow-up:

- Auxiliary client parity: process_bootstrap.build_keepalive_http_client
  accepts and forwards verify; auxiliary_client._resolve_aux_verify mirrors
  the main-client TLS resolution (via load_config_readonly, the read-only
  fast path) so compression/vision/web_extract/title-gen/session_search
  honor the same per-provider CA. Without this, chat worked against a
  private-CA endpoint but every auxiliary call still failed APIConnectionError.
- switch_model now reads custom_providers from live config (load_config_readonly)
  instead of the init-time agent._custom_providers snapshot, so ssl_ca_cert /
  ssl_verify edits are honored on mid-session model switch — matching the
  context-length reload (#15779).
- Drop the dead client-level verify= where a custom httpx transport is used
  (httpx ignores it there); verify lives on the transport. Fix docstrings.
  Applies to both run_agent._build_keepalive_http_client and process_bootstrap.
- resolve_httpx_verify: add CURL_CA_BUNDLE to the env chain (consistency with
  agent/ssl_guard._CA_BUNDLE_ENV_VARS) and emit a loud logger.warning naming
  the endpoint whenever ssl_verify:false disables verification.
- get_custom_provider_tls_settings: case-insensitive base_url match (config
  dedup already lowercases; scheme/host are case-insensitive) so a mixed-case
  entry doesn't silently drop its CA. Exact match preserved — no prefix bypass.
- Demote best-effort except Exception: pass in agent_init/switch_model to
  logger.debug(exc_info=True).
- Tests for aux verify forwarding, _resolve_aux_verify, case-insensitive
  match, and prefix-bypass rejection.
2026-07-02 04:51:56 +05:30
HexLab98
3a2ba959ce fix(agent): honor custom CA certs for custom_providers HTTPS endpoints
Wire ssl_ca_cert and ssl_verify through custom_providers config and env
vars into the keepalive httpx client, fixing APIConnectionError against
mkcert/self-signed Ollama proxies behind HTTPS.
2026-07-02 04:51:56 +05:30
Brooklyn Nicholson
428b9a0c42 fix(cli): render /journey color instead of leaking raw ANSI
In the interactive CLI, /journey dispatched straight to `args.func(args)`,
letting Rich write ANSI to stdout — which patch_stdout's StdoutProxy passes
through as literal `?[38;2;…m` garbage. Route the read-only views (default +
`list`) through a captured, force-color Console and re-emit via `_cprint`
(prompt_toolkit's ANSI parser), matching the `ChatConsole` idiom.
`delete`/`edit` stay on real stdio since they prompt / open `$EDITOR`.
2026-07-01 16:25:48 -05:00
Teknium
76a468e513
feat(models): add claude-fable-5, claude-sonnet-5, fugu-ultra to curated OpenRouter + Nous lists (#56617)
- claude-fable-5 placed above claude-opus-4.8 in both curated lists
- claude-sonnet-5 replaces claude-sonnet-4.6
- sakana/fugu-ultra added near the bottom (before routers/free tier)
- regenerated website/static/api/model-catalog.json via scripts/build_model_catalog.py (live-pulled by CLI, published on merge — no release needed)
2026-07-01 13:21:42 -07:00
Teknium
7c1a029553
chore: release v0.18.0 (2026.7.1) (#56611) 2026-07-01 13:07:40 -07:00
snav
e9bceb5ae0 fix(discord): ignore reply-ping-only mentions for bot-authored messages
Two Hermes bots sharing a channel could volley replies at each other
indefinitely. Root cause: Discord reply-pings (allowed_mentions
replied_user=true) add the replied-to bot to message.mentions without a
literal <@bot> token in the body, so the existing bot-admission gate
treated a reply chip as an explicit @mention and re-triggered the peer.

Adds opt-in discord.bots_require_inline_mention (default false; env
DISCORD_BOTS_REQUIRE_INLINE_MENTION). When enabled, bot-authored
messages must carry a raw inline <@id>/<@!id> mention in the content;
reply-ping-only mentions no longer admit the message. Human messages and
all existing defaults are unchanged.

The new _self_is_raw_mentioned helper deliberately ignores the resolved
message.mentions list (which reply-ping populates) and checks only the
raw content token via the shared _raw_mentioned_user_ids primitive.
2026-07-01 15:38:34 -04:00
kshitijk4poor
7322da487f refactor(codex-runtime): tidy reapply-migration control flow
Self-review follow-up (hermes-pr-review Phase 2, non-blocking clarity findings).

- Collapse the reapplying_enable predicate to a single chained comparison
  (new_value == current == "codex_app_server") instead of a two-clause AND
  that re-tested new_value == current.
- Dedent the msg_lines list literals (drop trailing single-element commas).

No behavior change: reapply still falls through to the idempotent migrate()
while skipping set_runtime/persist (prompt cache preserved), and the auto-disable
early-return is unchanged. 31/31 tests green.
2026-07-01 23:51:54 +05:30
snav
35eb93c8df fix(codex-runtime): re-running /codex-runtime codex_app_server when already enabled now triggers migration
The /codex-runtime slash command short-circuits with "openai_runtime
already set" when invoked with the same value as the current config,
and crucially skips the entire migration block below. The check
conflates two things: (a) "the config value is correct" and (b) "the
world state (managed block in ~/.codex/config.toml, hermes-tools MCP
callback, plugin discovery) is converged".

Common footgun this exposes: a user who pre-sets
`model.openai_runtime: codex_app_server` directly in config.yaml
(reasonable thing to do) and then runs /codex-runtime codex_app_server
to trigger migration sees "already set" and silently gets no migration.
~/.codex/config.toml never receives the managed block, the hermes-tools
MCP callback never registers, and codex falls through to its default
runtime instead of the app-server one — visibly successful but
functionally partial setup.

The migration is idempotent by design (it replaces its own managed
block in place between MIGRATION_MARKER and MIGRATION_END_MARKER), so
re-running it is safe and cheap. Fix the short-circuit to fall through
to migration when re-applying codex_app_server while skipping the
config persist (no value-level change needed). The disable case
(re-applying "auto") still short-circuits because disabling doesn't
touch ~/.codex/config.toml at all.

The user-visible message changes to "openai_runtime already set to
codex_app_server — re-applying migration" so re-runs surface what
happened.

Regression test (test_reapply_codex_app_server_runs_migration) asserts:
- migrate() was called when re-applying
- persist_callback was NOT called (no config write on no-op transitions)
- migration output (MCP servers, sandbox default) surfaces in the
  user-visible message
- requires_new_session is True so callers know to /reset

Verified RED→GREEN: the test fails on origin/main with
"migration must run on reapply, not just first enable" and passes with
this fix. Full test_codex_runtime_switch.py suite: 31 passed.
2026-07-01 23:51:54 +05:30
Teknium
eae3700b16
fix(moa): raise aux timeouts to 900s and give the Codex aux path a stable prompt_cache_key (#56395)
Two independent MoA auxiliary-call fixes:

#53866 — auxiliary.moa_reference.timeout and auxiliary.moa_aggregator.timeout
were 600s while moa_agent was 120s. Raise both to 900s so a genuinely long
reference/aggregator turn (mixed providers, deep reasoning, long tool chains)
has headroom instead of being cut mid-generation.

#53735 — _CodexCompletionsAdapter (the Codex/Responses auxiliary path used by
the MoA acting-aggregator, compression, web_extract, session_search, etc.)
never set prompt_cache_key, so it stayed cache-cold while the MAIN Responses
transport (agent/transports/codex.py) was warm. Derive the same
content-addressed key via the shared _content_cache_key(instructions, tools)
helper and set it on the aux Responses request, with the same host guards the
main transport uses (xAI carries the key in extra_body; GitHub/Copilot opts out
of cache-key routing).

Tests: 5 new prompt_cache_key cases (set+prefixed, stable across identical
prefix, differs on different instructions, skipped for xai/github hosts).
tests/agent/test_auxiliary_client.py 279 pass; tests/hermes_cli/test_config.py
130 pass.
2026-07-01 06:02:40 -07:00
teknium1
3f6c6bd29e fix(vertex): surface Vertex on the desktop Keys tab for provider parity
The provider-parity contract (tests/hermes_cli/test_provider_parity.py)
requires every hermes model provider to be configurable in the desktop
Providers tabs. Vertex authenticates via OAuth2 (service-account JSON /
ADC) and has no api_key_env_vars, so — like bedrock's aws_sdk — it needs
its credential env var tagged to the provider card explicitly. Tag
VERTEX_CREDENTIALS_PATH to the vertex card in _catalog_provider_env_metadata().
2026-07-01 05:25:33 -07:00
Steve Lawton
c73e74386b feat(vertex): add Google Vertex AI provider for Gemini (OAuth2)
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).

- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
  (5-min margin), ADC->service-account fallback, global vs regional
  endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
  reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
  credentials-file path is never mistaken for a static API key; mints a
  fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
  re-mints the token and rebuilds the client on a mid-session 401, so a
  long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
  env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
  google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
  host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
  runtime resolution.

Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
2026-07-01 05:25:33 -07:00