Commit graph

1270 commits

Author SHA1 Message Date
Teknium
87ae4ae94b
fix(update): harden #57659 follow-ups — task restore on failure, --force-venv split, trampoline detection, managed-install health (#57680)
Five follow-ups to #57659 from post-merge review:

1. install.ps1: gateway scheduled-task re-enable now runs in a finally
   (a thrown Remove-Item/uv venv failure previously stranded the user's
   gateway autostart disabled), and tasks that were already disabled
   before the install are no longer blindly re-enabled.
2. The venv-python holder guard is no longer bypassed by plain --force
   (which the desktop bootstrap passes on every update while its lock
   probe only checks hermes.exe/app.asar). New explicit --force-venv is
   the escape hatch; --force keeps bypassing only the hermes.exe shim
   guard.
3. _detect_venv_python_processes now also catches uv/base-interpreter
   trampolines whose exe is outside the venv, via cmdline (venv path or
   '-m hermes_cli.main' tied to this install root) and cwd.
4. Missing venv python is now UNHEALTHY on managed installs
   (.hermes-bootstrap-complete / .update-incomplete markers) so the
   repair lane runs instead of 'Already up to date!'; the repair branch
   recreates the venv first when it's gone entirely. Dev checkouts keep
   reporting healthy.
5. install.ps1 comment no longer claims a Startup-folder disarm the
   code doesn't perform (logon-only, not a mid-install respawner).
2026-07-03 04:08:37 -07:00
Teknium
372f8195c7
fix(moa): default temperatures to unset — provider default, like single-model agents (#57440)
A single-model Hermes agent never sends temperature; the provider default
applies. MoA hardcoded reference_temperature=0.6 / aggregator_temperature=0.4,
and the coercion float(preset.get(key, 0.6) or 0.6) made unset IMPOSSIBLE to
express: absent, null, empty, and even an explicit 0 all collapsed to the
baked-in default. Every MoA advisor and aggregator therefore ran at 0.6/0.4
while the same model running solo used the provider default — silently
skewing solo-vs-MoA comparisons and overriding provider-tuned defaults.

- moa_config normalization: temperatures coerce to None when absent/blank/
  invalid (new _coerce_float_or_none); explicit values incl. 0 honored.
- moa_loop: _preset_temperature() resolves preset values; None flows to
  call_llm, which already omits the parameter when None (same contract as
  max_tokens). Aggregator still inherits the acting agent's own configured
  temperature when the preset doesn't pin one.
- conversation_loop (context-mode MoA): same resolution, no more hardcoded
  0.6/0.4 at the call site.
- DEFAULT_CONFIG preset + web_server payload models + docs updated: unset
  is the default, pinning stays available.
2026-07-03 00:22:49 -07:00
Victor Kyriazakos
accd672054 fix(slack): MPIMs (group DMs) obey shared-surface mention gating + reaction guard
Group DMs (MPIMs) were classified as DMs and thereby exempted from every
operator control that shared surfaces are supposed to honor: allowed_channels,
require_mention, strict_mention, free_response_channels, and the reaction
guard. Symptom: the bot added 👀/ to unmentioned MPIM
messages and still invoked the agent (which then returned NO_REPLY) instead of
the gateway dropping the event before model execution. Removing an MPIM from
allowed_channels did not disable it.

Root cause is the DM classification at adapter.py:
    is_dm = channel_type in {"im", "mpim"}
used for BOTH routing exemptions and reaction gating. An MPIM is a shared
surface (multiple humans can see and trigger the bot), not a private 1:1 DM,
so it must be gated like a channel.

This behavior was introduced/reinforced by a trail of Slack group-DM PRs:
- #4633  fix(slack): treat group DMs (mpim) like DMs + reaction guard
- #54632 fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
- #54663 fix(slack): group DMs work OOTB + reinstall nudge
#54632/#54663 correctly made MPIM messages *reachable*; #4633 over-reached by
giving them the DM mention/reaction *exemptions*. This corrects only that
over-reach.

Fix (minimal): introduce `is_one_to_one_dm = channel_type == "im"` and key the
two EXEMPTION sites off it instead of `is_dm`:
- mention/allowlist gating block (`if not is_one_to_one_dm and bot_uid:`)
- reaction guard (`(is_one_to_one_dm or is_mentioned)`)
`is_dm` is intentionally retained for session/thread scoping and chat_type
labeling, where treating an MPIM as a persistent multi-party conversation is
correct — only the mention/reaction exemptions were wrong.

Docs: slack.md now distinguishes 1:1 DMs (mention-exempt) from group DMs
(shared surface; obey require_mention/strict_mention/allowed_channels/
free_response_channels; reactions only when @mentioned).

Tests: +7 in test_slack_mention.py (MPIM unmentioned dropped under
require_mention and strict_mention; MPIM mentioned processed; MPIM off
allowed_channels dropped; MPIM in free_response opted in; 1:1 IM still exempt;
reaction guard drops unmentioned MPIM). Updated _would_process to model the
is_one_to_one_dm gating + strict_mention. 72 passed.
2026-07-03 12:34:53 +05:30
Jaaneek
5ef0b8acb0 feat(auth): make xAI Grok OAuth device-code-only, drop loopback login
Replace the loopback/PKCE-callback server and manual-paste fallback with
the RFC 8628 device-code flow as the only xAI Grok OAuth login path. The
flow works in headless/SSH/container sessions with no 127.0.0.1 listener,
shrinking the local attack surface.

- Poll the token endpoint with server-provided interval, honoring
  slow_down and expires_in; store tokens with auth_mode
  oauth_device_code.
- Adaptive proactive refresh skew for short-lived device-code JWTs;
  rotated tokens sync back to auth.json, the global root store, and the
  credential pool (no refresh-token replay).
- Clear source suppression on successful re-login (CLI + dashboard) and
  drop the duplicate dashboard pool entry so exactly one seeded
  device_code entry exists.
- Use the shared device_code source name for consistency with the
  nous/codex device-code providers.
- Desktop: remove the loopback OAuth flow states and dead type variants;
  pkce providers' sign-in URL selection is unchanged.
- Docs (EN + zh-Hans) rewritten for device-code login; drop the deleted
  --manual-paste flag from documented commands.
2026-07-02 13:17:41 -07:00
CrazyBoyM
ecffd290a3 feat(image-gen): support Codex image inputs 2026-07-02 17:12:24 +05:30
Teknium
543d305bbb
feat(moa): add reference_max_tokens to cap advisor output and cut turn latency (#56756)
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.

Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).

E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.

- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
  in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
2026-07-02 00:16:35 -07:00
Teknium
76a468e513
feat(models): add claude-fable-5, claude-sonnet-5, fugu-ultra to curated OpenRouter + Nous lists (#56617)
- claude-fable-5 placed above claude-opus-4.8 in both curated lists
- claude-sonnet-5 replaces claude-sonnet-4.6
- sakana/fugu-ultra added near the bottom (before routers/free tier)
- regenerated website/static/api/model-catalog.json via scripts/build_model_catalog.py (live-pulled by CLI, published on merge — no release needed)
2026-07-01 13:21:42 -07:00
Teknium
ba0bc01d1f
feat(delegate): remove model-facing toolsets arg — subagents always inherit parent's (#56386)
The model could pass `toolsets` (top-level and per-task) to delegate_task,
letting it choose which toolsets a subagent got. Toolset selection is a
capability-scoping decision the model should not control; subagents inherit
the parent's enabled toolsets, period.

- Remove `toolsets` from the delegate_task() signature, the registry handler,
  the top-level + per-task JSON schema, and the live dispatch path
  (run_agent._dispatch_delegate_task — this forwarded it on every model call).
- Single-task and per-task child builds now pass toolsets=None so
  _build_child_agent resolves to pure parent inheritance.
- Drop the now-dead _SUBAGENT_TOOLSETS / _TOOLSET_LIST_STR schema-hint block.
- _build_child_agent keeps its internal toolsets param + intersection helpers
  (internal API; fed the inherited value only).
- Tests: schema assertions flipped to assertNotIn; added a regression test
  proving the dispatch path never forwards a smuggled model `toolsets`.
- Docs: update delegate_task signature refs in the autonomous-ai-agents skill.
2026-07-01 05:35:26 -07:00
Steve Lawton
c73e74386b feat(vertex): add Google Vertex AI provider for Gemini (OAuth2)
Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).

- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
  (5-min margin), ADC->service-account fallback, global vs regional
  endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
  reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
  credentials-file path is never mistaken for a static API key; mints a
  fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
  re-mints the token and rebuilds the client on a mid-session 401, so a
  long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
  env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
  google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
  host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
  runtime resolution.

Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.
2026-07-01 05:25:33 -07:00
Brett
9f03095044 fix(telegram): cap initialize() with per-attempt timeout so unreachable fallback IPs can't hang startup
Wrap each Telegram initialize() attempt in asyncio.wait_for(HERMES_TELEGRAM_INIT_TIMEOUT,
default 30s). When api.telegram.org and all fallback IPs are unreachable, the connect
chain has no outer bound, so a single initialize() blocks for minutes and the
retry-on-exception loop never fires — the gateway appears to hang after the banner.
The timeout guarantees each attempt is bounded, then retries with backoff, then fails
with an actionable error. Also adds WARNING-level progress logs before DoH discovery
and each connect attempt (visible at default log level).

Salvaged onto plugins/platforms/telegram/adapter.py (Telegram moved from
gateway/platforms/ since the PR was opened). Adds env var to docs + AUTHOR_MAP.

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-07-01 05:07:10 -07:00
Ben
751a300fca docs(cron): scope in_channel to channels; document DM continuation knob
Live DM testing showed a reply to a DM cron brief did NOT continue the job.
Root cause: for a 1:1 DM the governing knob is dm_top_level_threads_as_sessions
(default True), NOT reply_in_thread / cron_continuable_surface. Under the
default, each top-level DM keys to a per-message session (…:dm:<chat>:<ts>),
so a reply mints a new ts and can never converge with the flat …:dm:<chat>
session the cron seed creates.

A 1:1 DM has no thread-vs-timeline split, so "in_channel" has no coherent
meaning for a DM — cron_continuable_surface is a channel concept and is a
no-op for DMs. DM continuation is governed entirely by
dm_top_level_threads_as_sessions:
  - false → all top-level DMs share …:dm:<chat> → seed + reply converge → works
  - true (default) → per-message sessions → no continuation (cron or interactive)

Option A (chosen): document the requirement; no code change (the flat-DM seed
from the prior commit already lands correctly when the knob is false). Adds a
":::note 1:1 DMs" admonition to cron.md + the zh-Hans mirror.

Verification (real inbound handler, not a hard-coded assumption — the mistake
that made the earlier DM E2E falsely pass): tests/manual/cron_inchannel_dm_e2e.py
drives the REAL _handle_slack_message for a top-level DM under both knob values
and asserts false→converges (…:dm:D_TESTDM == seed), true→diverges
(…:dm:D_TESTDM:<ts>). See decisions.md D9.
2026-07-01 03:16:13 -07:00
Ben
4b4349eb9a feat(cron/slack): flat in-channel continuable cron delivery surface
Add a per-platform `cron_continuable_surface` extra key
(`thread` default | `in_channel`) so a continuable cron job can deliver
FLAT into a Slack channel — no dedicated thread — and still be
replied-to. In `in_channel` mode the scheduler skips the thread-open
branch (leaves `thread_id=None`); the shipped origin-mirror then seeds
the `(slack, chat_id, None)` shared-channel session — the same bucket
`reply_in_thread: false` routes inbound channel replies to — so a plain
channel reply continues the job in context.

Design: specs/cron-inchannel-continuable (D1–D7, F5). Model B
(shared-channel session), NOT anchoring to the delivery `ts` — on Slack
replying to a specific message IS threading, so a `ts` anchor would only
relocate the thread, never deliver true threadless continuable.

- gateway/platforms/base.py: `supports_inchannel_continuable` capability
  flag (default False → unsupported platforms fail SAFE to `thread`).
- plugins/platforms/slack/adapter.py: flag=True; `_cron_continuable_surface()`
  resolver (coerces to the two-value enum); `_warn_if_inchannel_without_flat_reply`
  connect-time warning (D5: warn, not hard-require — the misconfig fails safe).
- gateway/config.py: shared-key bridge line (top-level OR nested config).
- cron/scheduler.py: read the key generically from platform config, gate
  the `in_channel` branch on the adapter capability flag, skip thread-open.
  No new seed function (reuses the existing mirror — G6).

Pairing (docs): `in_channel` + `reply_in_thread: false` +
`require_mention: false` (or a free-response channel). Missing
`reply_in_thread: false` fails safe to a threaded continuation.

Gateway-side config flag — `/restart` to apply; NO Slack app reinstall.

Tests (from inside the worktree, PYTHONPATH=$PWD):
- +6 cron scheduler tests (in_channel skips thread-open; seeds flat
  channel session with thread_id=None; thread-mode regression;
  fail-safe on unsupported platform; value coercion). Prove-fail:
  removing the `and not in_channel_surface` guard turns the two
  load-bearing tests RED; restore → GREEN.
- +10 slack resolver/capability/warning tests; +2 config-bridge tests.
- tests/manual/cron_inchannel_e2e.py: offline E2E driving BOTH real
  legs (delivery seed + inbound reply keying) → both converge on
  (slack, C, None).
- No regressions: test_slack.py 216 passed alone; broader sweep green
  (4 pre-existing cross-file-ordering failures reproduce identically on
  pristine origin/main).

Docs: cron.md + slack.md + zh-Hans mirrors of both.
2026-07-01 03:16:13 -07:00
Teknium
01e681aa48
docs: unify /new and /reset rows in gateway slash-commands table (#56235)
The messaging gateway table still listed /new ("Start a new
conversation") and /reset ("Reset conversation history") as two
separate commands with divergent descriptions. /reset is an alias
of /new (see COMMAND_REGISTRY in hermes_cli/commands.py) — same
handler, fresh session ID + history. Collapse them into one row
matching the registry wording and the CLI table already on line 39.

Closes #42829.
2026-07-01 02:39:39 -07:00
Teknium
12556a9a77
chore(scripts): drop Open WebUI local bootstrap script (#56178)
Remove scripts/setup_open_webui.sh and its 'one-command local bootstrap'
doc sections (EN + zh-Hans). The script pip-installed the third-party Open
WebUI frontend into ~/.local and managed a launchd/systemd user service —
a maintenance liability for downstream software we don't own, and the source
of the LAN first-admin signup footgun in #36121.

The Open WebUI *integration* via the OpenAI-compatible API server is
unaffected: the Docker/Docker-Compose setup, multi-user profile guide, and
troubleshooting in open-webui.md stay, and Open WebUI remains a listed
supported frontend. Only the install-and-service bootstrapper is gone.
2026-07-01 01:30:40 -07:00
Teknium
8d78be5460
revert: back out prompt_caching.enabled toggle (#56105) for re-evaluation (#56126)
* Revert "fix(caching): honor prompt_caching.enabled across model switch + fallback"

This reverts commit 36f9f50145.

* Revert "fix: allow disabling prompt caching"

This reverts commit c1c1a12fe6.
2026-07-01 00:20:32 -07:00
teknium1
36f9f50145 fix(caching): honor prompt_caching.enabled across model switch + fallback
@janrenz's PR #35862 added prompt_caching.enabled=false at init only. But
_anthropic_prompt_cache_policy re-derives _use_prompt_caching on every /model
switch (agent_runtime_helpers) and fallback-model swap (chat_completion_helpers),
which re-enabled markers and re-broke the strict proxy the toggle was meant to fix.

Move the kill switch into anthropic_prompt_cache_policy so it returns (False, False)
on every path. Drop the now-redundant init-time override (kept @janrenz's isinstance
hardening on the cache_ttl read). Add policy-level tests + docs for the toggle.

Follow-up to salvaged PR #35862.
2026-07-01 00:10:42 -07:00
Ben
7c7b489813 feat(slack): render markdown tables as native Block Kit table blocks
Replace the interim monospace table fallback with Slack's native `table`
block (rows of rich_text cells). Addresses the core ask in #18918.

- _table_block(): builds type:"table" with rich_text cells, so inline
  formatting (bold, links, code) renders inside cells.
- Column alignment parsed from the markdown separator row (:---, :-:, --:)
  into column_settings (left = default/null-skip, center/right emitted).
- Escaped pipes (\\|) are not treated as column separators.
- Respects Slack's table limits (100 rows / 20 cols / 10k aggregate chars);
  oversized or unparseable tables gracefully fall back to aligned monospace
  (rich_text_preformatted), so a big table never breaks the message.

Docs (EN + zh-Hans) updated to describe native tables + the fallback.
Tests: native table shape, alignment->column_settings, inline-formatted
cells, oversized/too-wide monospace fallback, escaped-pipe cell. Prove-
failed against a stubbed _table_block (native-table tests fail, fallback
tests stay green). All existing Slack tests still pass.
2026-07-01 00:10:12 -07:00
Ben
b080b93ad8 feat(slack): opt-in Block Kit rendering for agent messages
Add platforms.slack.extra.rich_blocks (default off). When enabled, the
final agent message is sent as Slack Block Kit blocks — section headers,
dividers, and true nested lists via rich_text — instead of flat mrkdwn.

- New plugins/platforms/slack/block_kit.py: pure markdown->blocks renderer
  (headers, dividers, nested ordered/bullet lists, blockquotes, fenced code;
  pipe-tables as aligned monospace since Block Kit has no robust table block).
  Enforces Slack's 50-block / 3000-char section limits and returns None to
  fall back to plain text on empty/oversized/unexpected input. Never raises.
- adapter.send(): render blocks on the single-chunk primary message; a
  text= fallback is ALWAYS sent alongside (notifications/accessibility).
- adapter.edit_message(): blocks only on finalize=True, so intermediate
  streaming edits stay plain mrkdwn (no per-flush block re-derivation).
- Docs (EN + zh-Hans) + config example. Send-side only: no app reinstall.

Tests: pure-renderer unit suite + adapter integration suite (blocks present
when on, plain text when off, text fallback always set, finalize gating,
multi-chunk fallback). Prove-failed against a stubbed renderer.
2026-07-01 00:10:12 -07:00
Teknium
97e0bbef53
feat(lsp): add PowerShellEditorServices language server (#55930)
Registers PowerShell (.ps1/.psm1/.psd1) in the LSP server registry,
spawning PowerShellEditorServices over stdio via a pwsh/powershell
host. PSES ships as a GitHub release zip (no npm/go/pip recipe), so it
sits in the manual install tier alongside rust-analyzer and clangd.

The spawn builder resolves the module bundle from (in order) the
lsp.servers.powershell.command override, init bundlePath, the
PSES_BUNDLE_PATH env var, or <HERMES_HOME>/lsp/PowerShellEditorServices,
then launches Start-EditorServices.ps1 -Stdio with a non-interactive,
no-profile host. hermes lsp status/list report it as manual-only until
pwsh is present.

Docs and tests included.
2026-06-30 16:22:18 -07:00
kshitijk4poor
7b12753948 feat(gateway): expose platform_connect_timeout in config.yaml
Adds gateway.platform_connect_timeout (default 30s) to DEFAULT_CONFIG and
bridges it to the internal HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT env var
at gateway startup, following the existing gateway_timeout config->env
pattern. The env var remains the manual-override escape hatch and wins if
set explicitly; otherwise config.yaml supplies the value. This closes the
issue's documentation/config-surface request (#19776 suggestion 2) on top
of the adapter ready-wait fix, so users no longer need an undocumented env
var to raise the Discord connect timeout.

Refs #19776
2026-06-30 15:03:25 -07:00
Teknium
643b0dc678
fix(cron): raise default pre-run script timeout from 120s to 1h (#55489)
Cron pre-run scripts were capped at 120s by default, which surprised
users running long data-collection scripts on crons (the whole point of
crons being to offload long work). Raise _DEFAULT_SCRIPT_TIMEOUT to 3600s
(1 hour).

This bounds the script only — skill/agent jobs already run on a separate
inactivity budget (HERMES_CRON_TIMEOUT, default 600s idle, 0=unlimited),
not a wall-clock cap. Scripts dispatch to a persistent thread pool and do
not hold the tick lock, so a long script doesn't starve other due jobs.

Docs clarified to make the script-vs-agent timeout distinction explicit.

env/config overrides (HERMES_CRON_SCRIPT_TIMEOUT,
cron.script_timeout_seconds) unchanged and still take precedence.
2026-06-30 01:00:39 -07:00
Brooklyn Nicholson
a10113658b feat(agent): add pre_verify hook and verify-on-stop coding guidance
Add a `pre_verify` user/plugin/shell hook fired once per turn when the agent
edited code and is about to finish, after the existing verify-on-stop guard. A
hook can keep the agent going one more turn (run a check, defer it, tidy the
diff) by returning {"action":"continue","message":...} (the Claude-Code Stop
shape {"decision":"block","reason":...} is accepted too). Hooks receive coding,
attempt, final_response, and sorted changed_paths so they can self-scope and
self-throttle; the path is bounded by agent.max_verify_nudges and preserves
message-role alternation.

Hermes still ships its default coding guidance (agent.verify_guidance, on by
default), but it now rides the evidence-based verify-on-stop missing-evidence
nudge instead of a separate default pre_verify continuation, so it costs no
extra model turn of its own. Guidance reuses the shared utils.is_truthy_value
parser rather than a local copy.
2026-06-30 00:59:29 -05:00
Ben Barclay
05ac16778b feat(gateway): per-platform typing_indicator toggle
Add a generic per-platform PlatformConfig.typing_indicator flag (default
True) that gates the _keep_typing refresh loop in
_process_message_background. When false, the loop is never spawned, so no
typing/"is thinking…" status is shown on that platform — message delivery
is otherwise unchanged.

Mirrors the gateway_restart_notification contract exactly: dataclass field
+ to_dict/from_dict (with extra-fallback resolution) + shared-key bridge in
load_gateway_config, so 'slack: typing_indicator: false' under platforms
works without a separate block. Generic by design — the same key works for
every platform (Slack 'is thinking…', Telegram/Discord/Signal typing).

Motivated by users who find Slack's assistant 'is thinking…' status noisy
(it also briefly disables the compose box, via the Assistant API).
2026-06-29 21:12:57 -07:00
Teknium
d4c14011eb
feat(claude-design): add surface-first conditioning + slop diagnostic (#55399)
Port the two genuinely-novel ideas from Command Code's /design skill into
our existing claude-design skill (skill-only, zero model-tool footprint):

- Surface-First: commit to one of 7 surface archetypes (Monitor/Operate/
  Compare/Configure/Decide/Explore/Command) before any visual tokens. Most
  AI design slop is compositional, not cosmetic — conditioning generation on
  a surface choice collapses entropy the way a CoT step does. Workflow step 3.
- Slop Diagnostic: the ~10 tells that account for ~90% of the 'this is AI'
  signal, as a score-out-of-10 self-audit. Diagnose-then-treat: the report is
  context not a to-do list; repair only what fired, matched to the tell
  (re-layout vs recolor vs de-decorate). Workflow step 7 (Verify).

Did NOT clone /design's 16-mode CLI, proprietary reference corpus, or make it
a core tool. Docs page regenerated via generate-skill-docs.py.
2026-06-29 21:12:29 -07:00
Teknium
c6c1fd8b6b
docs: create dev venv outside the source tree (root-cause fix for #7779) (#54862)
A manually-installed venv inside the cloned repo can be destroyed by the
agent running a relative-path command against its own checkout (rm -rf venv,
uv venv venv, etc.), silently wiping the running runtime mid-session. Moving
the canonical manual-install venv to ~/.hermes/venvs/hermes-dev means no
relative path from the agent's workspace resolves to its own runtime, making
the bug class impossible without any command-detection code.

Closes the root cause of #7779. The managed install.sh layout is unchanged.
2026-06-29 10:00:37 -07:00
teknium1
75317d82d0 fix(vision): narrow the fan-out cap to the CPU encode burst only
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.

The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:

- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
  encode/resize/dimension-check off the caller's loop, sized to the host's
  usable core count with NO fixed ceiling — the cap tracks the actual
  exhausted resource (cores), not a magic number. Excess encodes queue on
  the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
  workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
  (honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.

Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
2026-06-29 01:27:10 -07:00
Ben Barclay
eddfecd2ce fix(vision): cap vision_analyze fan-out concurrency process-wide
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.

Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).

It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.

Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.

Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
2026-06-29 01:27:10 -07:00
teknium1
34e616e778 feat(slack): nudge stale installs to add mpim scopes; mark message.mpim required
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
2026-06-29 01:02:53 -07:00
Ben
4125cc3b7c fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
Group DMs (multi-person DMs, channel_type=mpim) were never delivered to
the Slack bot. The adapter already classifies mpim as a DM and replies
ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the
generated app manifest only subscribed to message.im / im:history — the
1:1 DM pair. Without the message.mpim event subscription Slack drops
group-DM messages before the adapter ever sees them, so 1:1 DMs worked
while group-DM ambient mode was dead.

Add message.mpim to bot_events and mpim:history (the scope that event
requires per Slack docs) + mpim:read (mirrors im:read for the
conversations.info classification call) to bot_scopes. Update the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs
(EN + zh-Hans: scope table, event table, troubleshooting) so existing
installs are told to add the new scopes and reinstall.

Reported by an enterprise customer. Note: this is a manifest/scope
change, so it only takes effect after the app is reinstalled and the
new scopes are accepted.

Tests: assert message.mpim + mpim:history + mpim:read are in the
manifest (with and without assistant mode); both fail on current main
and pass with this change.
2026-06-29 01:02:53 -07:00
Ben Barclay
e1f4098b9f
docs(cron): document explicit per-channel delivery targets for all platforms (#54630)
The cron delivery table only showed Discord/Telegram with explicit
target syntax and described Slack and every other platform as
home-channel-only. In fact the generic platform:<target> routing in
_resolve_single_delivery_target resolves explicit targets for every
platform: Slack (#channel / channel ID / channel:thread_ts), Matrix
(room/user IDs), Feishu (chat:thread), WhatsApp (JID / E.164), Signal
(group / E.164), SMS, Email, and Weixin all have dedicated explicit-
target branches in _parse_target_ref; the remaining platforms accept a
generic platform:<chat_id> passthrough.

Update the Delivery Model table (en + zh-Hans) to show the real
per-platform syntax, document #channel name resolution via the channel
directory, and note the Slack thread_ts nuance. Docs-only.
2026-06-29 15:23:16 +10:00
Brooklyn Nicholson
e684b808ad fix(desktop): route old runtimes through dashboard when serve is absent
`hermes serve` is newer than the desktop binary's release cadence, so a new
app launched against an un-upgraded managed install / PATH `hermes` would
crash on an unknown subcommand and brick the user mid-upgrade. Detect whether
the resolved runtime registers `serve` (fast source read of its dashboard.py,
with a one-time CLI probe fallback) and rewrite the backend argv to the legacy
`dashboard --no-open` only when it does not. Happy path (current runtimes)
pays nothing and still spawns `serve`.

- electron/backend-command.cjs: pure serve/dashboard argv helpers + serve-
  source detection (unit-tested in backend-command.test.cjs)
- main.cjs: backendSupportsServe() cache + getBackendArgsForRuntime() guard at
  both backend spawn sites; expose `root` from the Windows venv unwrap so the
  fast source check covers Windows too
- docs: note the backward-compat fallback in README, desktop.md, AGENTS.md
2026-06-28 22:10:42 -05:00
Brooklyn Nicholson
dff491a2b9 feat(cli): add headless hermes serve backend; desktop no longer launches dashboard
The desktop app spawned `hermes dashboard --no-open` as its backend, which
made the dashboard look like a desktop prerequisite. Add a dedicated headless
`hermes serve` command that boots the same gateway (shared cmd_dashboard /
start_server) but never opens a browser, and point the desktop backend spawn
exclusively at it. dashboard and serve are now independent surfaces — neither
launches the other.

- subcommands/dashboard.py: factor shared server args; add `serve` parser
  (always headless; accepts legacy --no-open as a no-op)
- main.py: register serve in _BUILTIN_SUBCOMMANDS + coalesce set + gui-log
  detection; extend stale-backend reaper patterns to match `serve`
- desktop electron: spawn `serve`, rename dashboardArgs -> backendArgs,
  update comments + windows-child-process test assertions
- docs: desktop README, desktop.md (incl. remote-backend), AGENTS.md, and
  cli-commands.md now describe `hermes serve` as the desktop/headless backend
2026-06-28 22:04:22 -05:00
Brooklyn Nicholson
f019a999d8 docs: clarify desktop is self-contained, not dependent on the dashboard
The desktop app spawns a headless `hermes dashboard --no-open` backend and
talks to it through the shared @hermes/shared WebSocket client — it never
runs or requires the browser dashboard UI. Spell this out in the desktop
README, the desktop docs page, and AGENTS.md so "dashboard" stops reading
as a desktop prerequisite.
2026-06-28 21:50:33 -05:00
Teknium
b31b0b9d95
docs: reconcile docs with code across last 3 releases (#54254)
Audited the last 3 releases (v2026.5.28..main) against the docs site and
fixed code-vs-docs drift:

- slash-commands: add /moa, /prompt, /pet, /hatch, /timestamps
- cli-commands: add hermes pets / project / desktop / whatsapp-cloud +
  dashboard register; correct --insecure (now a deprecated no-op);
  add gateway migrate-legacy + enroll --wake-url + dashboard --skip-build
- environment-variables: document the remaining ~48 env vars (SimpleX,
  Photon, Teams adapter, per-platform *_ALLOW_ALL_USERS, home-channel vars,
  IRC, Brave/Krea/Notion/Linear/Airtable/Tenor keys, QQ_SANDBOX) — full
  OPTIONAL_ENV_VARS (265) now covered
- configuration: document tool_loop_guardrails, goals, prompt_caching,
  network, onboarding, dashboard config blocks
- toolsets/tools-reference + tools.md: add coding/project toolsets and
  read_terminal/project_* tools; remove the stale messaging toolset and
  send_message agent tool (removed in #47856); drop stale RL-training prose
- messaging: new IRC channel page (adapter shipped without docs) + index
  row + sidebar + env vars
- pets: document the /hatch AI generation pipeline + Nous/OpenRouter image
  backend
- web-dashboard: document the bearer-token / TokenPrincipal service auth path
- purge agent-callable send_message references across guides/features and
  the research-paper-writing skill (tool removed in #47856)

Verified: docusaurus build succeeds; all authored internal links resolve.
2026-06-28 12:47:50 -07:00
Christian Persico
135f235165 docs: fix incorrect web search instructions 2026-06-28 02:54:27 -07:00
Teknium
de6e9ac760
docs(discord): document bot-to-bot comms as unsupported (#32791) (#54063)
* docs(discord): document bot-to-bot comms as unsupported (#32791)

Multi-profile bot-to-bot conversation is not a supported topology.
DISCORD_ALLOW_BOTS=none (the default) blocks all bot-originated
messages; setting mentions/all across multiple Hermes profiles to make
them reply to each other ack-loops because Discord's reply auto-mention
satisfies the mention gate every turn. Document the safe default and
the loop hazard so operators don't wire it up.

* docs(discord): infographic for bot-to-bot unsupported stance (#32791)
2026-06-28 01:15:34 -07:00
Teknium
1b70a91844
docs: third-party-product plugins ship standalone, not into core tree (#54001)
* docs: third-party-product plugins ship standalone, not into core tree

Generalizes the closed-set memory-provider policy to any plugin that
integrates someone else's product/project (observability backends,
vendor SaaS, analytics dashboards, paid-service tie-ins). These create
an open-ended maintenance burden on us for backends we don't own, so
they ship as standalone plugin repos installed into ~/.hermes/plugins/
and are promoted in #plugins-skills-and-skins — not merged into core.

- AGENTS.md: new 'what we don't want' bullet + generalized policy note
  beside the memory-provider closed-set rule
- CONTRIBUTING.md: new 'Third-Party Product Integrations' section
- build-a-hermes-plugin.md: caution callout at the top of the guide

It's a coupling decision, not a quality bar — a plugin can clear review
and still be a close.

* docs: add infographic for standalone-plugin policy
2026-06-27 22:23:50 -07:00
teknium1
a1ac6baac4 fix(gateway): make bg-process reset TTL configurable + surface session-scoped processes
Follow-up to the cherry-picked #29212 (#29177):

- Promote the 24h stale-process threshold to config.yaml
  (session_reset.bg_process_max_age_hours) instead of a hardcoded
  constant. 0 disables the cutoff (legacy: any live process blocks reset).
  Wired through GatewayConfig.default_reset_policy in gateway/run.py.
- Bug 2: process(action=list) now resolves the gateway session_key from
  the contextvar and surfaces session-scoped background processes (a
  forgotten preview server under a different task), flagged
  session_scoped — so the agent/user can discover and kill the blocker.
  Previously the task-scoped list returned [] and the blocker was invisible.
- Tests: config round-trip for the new field, cross-task list visibility.
- Docs: messaging session-reset section.
2026-06-27 20:45:43 -07:00
xxxigm
6f1a176b33 fix(gateway/discord): REST liveness probe to detect zombie clients (#26656)
The Discord adapter could enter a silent zombie state after a network
outage / proxy stall: the process is alive, _client looks open, but the
underlying socket is dead. discord.py's WebSocket reconnect never sees a
RST through a wedged proxy/NAT, so client.start() spins forever without
exiting — which means the bot-task done callback (which only fires on
task completion) never trips either. The bot stays "offline" in Discord
until a manual `hermes gateway restart`. Reported offline for 13-17h.

Adds an out-of-band REST liveness probe in DiscordAdapter. Every
`discord.liveness_interval_seconds` (default 60s) the adapter issues a
cheap fetch_user(bot_id) — the same REST path as message delivery, so it
fails when the proxy/NAT is wedged. After
`discord.liveness_failure_threshold` consecutive failures (default 3) the
probe closes the wedged client and surfaces a retryable fatal error,
which trips the gateway's existing _platform_reconnect_watcher and
rebuilds the adapter. Operators disable it by setting either knob to 0.

Config lives in config.yaml (discord.liveness_*) per the .env-is-secrets
policy; _apply_yaml_config bridges it to internal env vars the adapter
reads, matching the existing HERMES_DISCORD_TEXT_BATCH_* pattern.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-27 19:30:32 -07:00
Teknium
6717cfc805
docs(gateway): warn against custom ExecStopPost kill drop-in (restart loop) (#53903)
A user-added systemd drop-in like ExecStopPost=/bin/kill -9 $MAINPID fires
on every stop, including clean restarts — it SIGKILLs the freshly spawned
gateway before it stabilizes and Restart=always respawns it, producing an
infinite restart loop (issue #23272). The unit Hermes installs already shuts
down cleanly via KillMode=mixed + KillSignal=SIGTERM with Restart=always +
RestartForceExitStatus, so no extra kill is needed. Document this as a danger
callout in the gateway service-management section.
2026-06-27 19:04:29 -07:00
Teknium
789f8b7dc2
docs(webhook): clarify authenticated != trusted-content trust model (#53562)
HMAC validation authenticates the webhook sender, not the business
fields inside the payload (PR titles, commit messages, issue bodies),
which are authored by untrusted third parties. Expand the prompt-
injection section to make the trust boundary explicit: the agent's
capability surface, not the input channel. Document the hardening
levers (sandbox the runtime, scope the toolset, keep approvals on,
template narrowly) instead of pretending to sanitize untrusted text.

Refs #8820.
2026-06-27 03:43:33 -07:00
teknium1
50f6855217 feat(moa): make /moa one-shot only; route preset switching through the model picker
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.

Also fixes #53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.

Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
2026-06-27 03:09:09 -07:00
Mahesh Sanikommu
1b75b3fd90 feat(memory): add Supermemory setup connection summary
Add post_setup() and get_status_config() to the Supermemory memory
provider so `hermes memory setup` and `hermes memory status` print a
one-line connection summary (container, profile fact count,
auto_recall/auto_capture). Point API-key onboarding at the Hermes
connect URL (app.supermemory.ai/integrations?connect=hermes).

Salvage of #52988. Two fixes folded in:

- Test isolation: the new probe/status tests mocked _SupermemoryClient
  but not the __import__("supermemory") guard inside
  _probe_supermemory_connection, so they passed only where the optional
  supermemory package was installed and failed on a clean checkout / CI
  (the PR shipped with red CI). Added _stub_supermemory_importable()
  mirroring the existing test_is_available_false_when_import_missing
  pattern; the suite now passes with supermemory absent.

- post_setup: `if api_key and api_key not in os.environ` checked whether
  the key's *value* named an env var (always false in practice). Fixed to
  compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`.

Verified: 38/38 in test_supermemory_provider.py and the full
tests/plugins/memory/ suite green with supermemory not installed.

Closes #52988
2026-06-27 15:07:34 +05:30
kshitijk4poor
cdb1dfbc49 fix: use os.pathsep, add tests, update tips for multi-root support
- Use os.pathsep instead of literal ':' so Windows paths (C:\dir) and
  the Windows separator ';' work correctly.
- Add 9 tests covering multi-root behavior: writes inside first/second
  root, writes outside all roots, trailing/leading/double separators,
  all-separators edge case, static deny priority, duplicate dedup.
- Update hermes_cli/tips.py tip string to mention multiple paths.
- Update docs to mention os.pathsep / ; on Windows.

Follow-up for salvaged PR #49557.
2026-06-27 04:01:12 +05:30
Zheng Tao
d15cc9bc83 docs: update HERMES_WRITE_SAFE_ROOT docs with multi-path format
Add note about colon-separated multiple directories support.
2026-06-27 04:01:12 +05:30
Teknium
9b2af36d5a
docs(moa): document prompt-caching behavior for references and aggregator (#53218)
* docs(moa): document prompt-caching behavior for references and aggregator

* docs(moa): clarify references preserve cache, only aggregator trades reuse

* docs(moa): correct caching prose — tail-append preserves aggregator cache too
2026-06-26 12:58:05 -07:00
ethernet
ba7026c376 feat(docs): clarify termux/nix as t2 platforoms 2026-06-26 11:37:56 -07:00
ethernet
772cf847b0 feat(docs): clarify platform support 2026-06-26 11:37:56 -07:00
Teknium
2d3071f9d4
docs(moa): clarify MoA presets are selectable on every surface (CLI, hermes model, Dashboard, Desktop, TUI) (#53211) 2026-06-26 11:16:14 -07:00
Teknium
9dd56f0dfb
docs(moa): add HermesBench results to Mixture of Agents page (#53206) 2026-06-26 11:05:07 -07:00