Commit graph

14043 commits

Author SHA1 Message Date
teknium1
49a87bcd1e chore(release): map SahilRakhaiya05 contributor email for #44073 salvage 2026-07-01 03:56:28 -07:00
SahilRakhaiya05
bb304b4914 fix(gateway): fail-closed external-surface defaults + profile-aware multiplex authz
Aligns runtime behaviour with SECURITY.md 2.6: externally reachable
messaging adapters must fail closed unless access is explicitly
configured. Closes the confirmed multiplex authorization bypass a
secondary profile's open dm/group policy no longer inherits the default
profile's allowlist trust.

- Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default
  dm_policy/group_policy to pairing/allowlist instead of open; open now
  requires an explicit GATEWAY_ALLOW_ALL_USERS or per-platform allow-all.
- Startup guard (_own_policy_open_startup_violation) refuses to boot when
  an enabled adapter is open without the allow-all opt-in; the guard now
  runs for every secondary profile in multiplex mode too.
- Profile-aware own-policy authorization: _authorization_adapter /
  _adapter_for_source resolve the live adapter via SessionSource.profile,
  so _is_user_authorized and the ingress/pairing/busy/queue paths read the
  originating profile's adapter policy, not the default profile's.
- Fail-closed intake for Email, Feishu P2P, and Discord (blank-principal
  denial, empty-allowlist deny, missing-interaction.user deny).

Salvaged from #44073 (external-surface hardening), split into a focused
gateway-authz PR per maintainer request. Follow-up fix by Hermes Agent:
the Discord slash-auth channel bypass now matches DISCORD_ALLOWED_CHANNELS
by the same name-inclusive keys (id + name + #name + parent) the on_message
scope gate uses, so a name-form channel allowlist authorizes slash
interactions consistently (was id-only, breaking #name matching).

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-07-01 03:56:28 -07:00
srojk34
8e94e8f882 fix(discord): tag unverified channel-context senders like Slack threads
Discord's _fetch_channel_context backfills recent channel/thread activity
(from any member who can post there, not just the allowlisted user) into
the agent's context with no sender-trust distinction. Slack's equivalent
_fetch_thread_context was fixed to prefix non-allowlisted senders with
[unverified] and add LLM guidance not to act on their content, mitigating
indirect prompt injection from third parties in shared channels/threads.
Port the same mechanism to Discord using the already-wired
_is_sender_authorized/set_authorization_check plumbing.
2026-07-01 16:25:16 +05:30
kshitijk4poor
23518a5e02 test(review): add integration guards for the two isolation wirings (review)
Phase 2c mutation-check found the salvaged tests covered only the pure helpers
(_is_background_review_harness_message / _strip_background_review_harness) — the
two integration WIRINGS had zero coverage: removing the _persist_disabled guard
in _flush_messages_to_session_db, or the _strip call in
get_messages_as_conversation, left all 13 tests green.

Add:
- TestPersistDisabledHardStop: a _persist_disabled agent's flush writes nothing
  to a live SessionDB (guards the run_agent hard-stop).
- TestGetMessagesAsConversationStripsHarness: a session with stray harness rows
  resumes clean end-to-end through get_messages_as_conversation (guards the
  hermes_state load-time wiring).
Mutation-checked: each new test fails when its wiring is reverted.
2026-07-01 16:21:39 +05:30
arminanton
e2fa509bf3 fix(review): isolate the background-review fork from the canonical session
The forked skill/memory review agent shares the parent's session_id for
prompt-cache warmth. Without isolation it wrote its harness turn ('Review the
conversation above and update the skill library…') plus its curator-mode reply
straight into the user's REAL session in state.db; the next live turn re-read
that injected user message as a standing instruction and the agent 'became' the
curator, refusing the actual task.

Root fix: a _persist_disabled flag on the fork that hard-stops every DB write
and lazy-open path (_flush_messages_to_session_db, _ensure_db_session,
_get_session_db_for_recall) — the review writes only to the skill/memory stores
via its tools. Defense-in-depth: _strip_background_review_harness drops any
stray harness message (and the assistant reply that followed) at load time in
get_messages_as_conversation, so an already-polluted session resumes clean.

Salvaged from #50296.

Co-authored-by: arminanton <29869547+arminanton@users.noreply.github.com>
2026-07-01 16:21:39 +05:30
Swissly
242c9639a8 fix(cron): prevent multi-target delivery loop crash on per-target failure
The standalone thread-pool fallback in _deliver_result() runs inside the
`except RuntimeError:` block (taken when asyncio.run() sees a running loop).
When future.result() raised there (SMTP ConnectionError, timeout, etc.), the
exception was NOT caught by the sibling `except Exception:` — it escaped
_deliver_result() and crashed the whole delivery loop, silently skipping every
remaining target. Multi-target delivery (e.g. deliver: 'email:a,email:b') is a
documented feature, so this broke a promised contract.

Wrap the fallback in its own try/except so a per-target failure is logged with
exc_info and the loop continues to the next target.

Fixes #47163
2026-07-01 03:48:37 -07:00
kshitijk4poor
d3010b74db test(agent): strengthen id-reuse regression + refresh flush docstring (review)
Phase 2c review follow-up on the id()-reuse persistence fix:

- test_recycled_id_in_dedup_set_still_persists_new_message seeded an EMPTY
  dedup set, so it never injected a collision and passed under id-based dedup
  too (couldn't distinguish the designs). Replace with
  test_stale_seed_id_from_prior_flush_cannot_suppress_new_message, which asserts
  the durable invariant: the seed is empty after every flush (mutation-checked:
  removing the post-flush reset now fails BOTH id-reuse tests).
- Refresh the _flush_messages_to_session_db docstring: it still described the
  old per-session identity tracking; document the intrinsic-marker mechanism,
  that _flushed_db_message_ids is now a one-shot seed, and the shared-dict
  mutation safety note.
2026-07-01 16:17:46 +05:30
rrevenanttt
e4c6d1b22b fix(agent): persist messages by intrinsic marker to stop id() reuse data loss
_flush_messages_to_session_db deduped persisted messages with a retained
{id(msg)} set (_flushed_db_message_ids) kept across turns. Once a flushed dict
is dropped from the live list (scaffolding rewind / in-place compaction) and
GC'd, CPython recycles its address onto a new assistant/tool dict whose id()
collides with the stale entry — so the real turn is silently never written to
state.db.

Replace the retained id-set with an intrinsic _DB_PERSISTED_MARKER stamped on
each dict. The id-set is demoted to a one-shot seed (valid only while the
caller's objects are alive) that is translated to markers and cleared after
every flush, so no id() outlives a flush to alias a future message. The marker
is _-prefixed so the wire sanitizers strip it before any request leaves.

Preserves the existing _is_ephemeral_scaffolding skip. Salvaged from #50372.

Co-authored-by: rrevenanttt <290873280+rrevenanttt@users.noreply.github.com>
2026-07-01 16:17:46 +05:30
kshitij
1d6645b17f
Merge pull request #56296 from kshitijk4poor/fix/gateway-force-exit-pidlock-release
fix(gateway): release PID file + runtime lock in the force-exit backstop
2026-07-01 16:14:26 +05:30
kshitijk4poor
b7adad1a72 test(error-classifier): parametrize 5xx overflow test over 500/502/503/529
Review nit (helix4u): the fix covers 500/502/503/529 but the positive tests
only asserted 500 and 503. Parametrize over all four so 502/529 are covered
too; keep the plain-5xx negatives.
2026-07-01 16:14:16 +05:30
pefontana
a04b7024ff fix(error-classifier): route 5xx context-overflow into compression
Local inference servers (llama.cpp/llama-server, vLLM/Ollama behind a
Cloudflare/Tailscale hop) report context overflow with HTTP 500/502/503/529
instead of 400/413. _classify_by_status returned server_error/overloaded and
retried blindly, then dropped the turn with no compaction. Route explicit
_CONTEXT_OVERFLOW_PATTERNS matches on those 5xx codes to context_overflow
(should_compress=True); plain 500 stays server_error, plain 503 overloaded.
2026-07-01 16:14:16 +05:30
Teknium
74809b4e94
fix(cli): reap dead-locked worktrees so .worktrees/ can't grow unbounded (#56288)
hermes -w locks each worktree (reason 'hermes pid=<pid>'). git worktree
remove --force (single -f) refuses a locked tree, so a crashed session's
lock was never released and its worktree accumulated forever — a real
contributor to .worktrees/ bloat.

_prune_stale_worktrees now classifies each lock via _worktree_lock_is_live:
a live-owner pid is skipped at any age; a dead-owner (or foreign) lock is
unlocked first so the aggressive age-based cleanup can actually reap it.
The >72h reap tier is kept (that cleanup is intentional) but now guarded so
dirty/unpushed work is preserved, and branch deletion is gated on
git worktree remove succeeding. New fail-safe helpers _worktree_is_dirty
and _worktree_lock_is_live (pid liveness via gateway.status._pid_exists,
Windows-safe).
2026-07-01 03:43:20 -07:00
teknium1
5c2dccd06f chore(release): map kangsoo-bit author for PR #47508 salvage 2026-07-01 03:42:32 -07:00
kangsoo-bit
7a2369718a fix(telegram): keep polling alive during transient bootstrap outages
A transient Bot API network error during gateway bootstrap (deleteWebhook
or the initial start_polling) currently raises out of connect() and marks
the Telegram adapter fatal, restart-looping the whole gateway even though
the right behavior is to degrade the Telegram channel and let the existing
reconnect ladder recover in the background.

- _delete_webhook_best_effort(): swallow only transient network errors and
  continue to polling; non-network errors (e.g. auth failures) still raise.
- _start_polling_resilient(): on a transient conflict/network error at
  bootstrap, schedule background recovery and return degraded instead of
  raising; non-transient errors still propagate.
- Track the polling error-callback recovery tasks in _background_tasks so
  they can't be garbage-collected mid-flight.
- Add a second Telegram Bot API seed fallback IP (149.154.166.110).

Reconnect keeps its existing 10-retry -> supervisor-restart semantics; this
change only fixes the bootstrap raise, it does not alter the retry ladder.
2026-07-01 03:42:32 -07:00
teknium1
9dd6451c80 chore(release): add WXBR to AUTHOR_MAP for #46183 salvage 2026-07-01 03:34:49 -07:00
WXBR
59e7e9d007 fix(agent): persist recovered final responses
Close a recovery/fallback final_response with an assistant transcript entry before session persistence so durable history cannot end at a tool/user message after the caller receives a final answer.

Adds a regression for a tool-tail transcript with a non-empty final_response. Related to #46071 / #46053, but covers the adjacent case where the assistant message was never appended before persistence.
2026-07-01 03:34:49 -07:00
kshitijk4poor
df27267ed7 fix(gateway): release PID file + runtime lock in the force-exit backstop
Follow-up to #54111. That PR routed the early SystemExit exit paths
(clean-fatal-config #51228, startup-aborted-before-running) through
_exit_after_graceful_shutdown / os._exit. Those paths raise right after
runner.start() without going through _stop_impl, so they relied on atexit
to release the PID file + runtime lock — and os._exit bypasses atexit,
leaking both.

Release them explicitly in the backstop (the single guaranteed cleanup
chokepoint). Both calls are idempotent: no-op on the normal _stop_impl
path, actual cleanup on the early-exit paths. Corrects the now-inaccurate
docstring claim that teardown always ran first. Adds a guard test plus the
missing str-code->1 coverage.

E2E: real PID file written + lock acquired, _exit_after_graceful_shutdown(78)
exits code 78 AND removes the PID file (leak confirmed closed).
2026-07-01 15:59:37 +05:30
YLChen-007
e23f723389 fix: make streaming reasoning-tag filter case-insensitive
The streaming think-tag suppressors in cli.py (_stream_delta) and
gateway/stream_consumer.py (_filter_and_accumulate) matched tag names
with case-sensitive str.find(), so only the exact-case literals in the
tag tuples were caught. Mixed-case variants a model may emit — <Think>,
<ThInK>, <REASONING>, <Thought> — slipped through and leaked raw
reasoning into the user-visible stream.

Match against a lowercased view of the buffer with lowercased tag names
at all three sites (open-tag boundary search, partial-tag hold-back,
close-tag search) in both paths. Only KNOWN tag names are matched — no
substring matching — and the block-boundary gating that protects prose
mentions of <think> is preserved.

- 6 parametrized case-insensitive regression tests in each of
  tests/gateway/test_stream_consumer.py and
  tests/cli/test_stream_delta_think_tag.py.

Salvaged from PR #27289 by @YLChen-007.
2026-07-01 03:25:02 -07:00
pprism13
f049227f31 fix(state): order conversation replay by id, not timestamp
get_messages_as_conversation ordered rows by (timestamp, id). append_message
stamps each row with time.time(), which is not monotonic — on WSL2, after an
NTP step, or when a VM/laptop resumes from sleep the clock can jump backwards
mid-conversation. A later row then carries an earlier timestamp than its
predecessor, so ORDER BY timestamp sorts an assistant tool_calls row after its
tool response, orphaning the tool call and triggering an HTTP 400 on the next
completion. Order by the AUTOINCREMENT id (true insertion order) instead.

This is the sibling path to c03acca50, which already fixed get_messages but
missed get_messages_as_conversation.

Salvaged from #50356.

Co-authored-by: pprism13 <290877921+pprism13@users.noreply.github.com>
2026-07-01 15:52:37 +05:30
kshitijk4poor
cde3ca4ebf fix(gateway): widen force-exit to SystemExit paths + os._exit regression tests (#53107)
Builds on the salvaged force-exit fix:
- Route the start_gateway() SystemExit paths (clean-fatal-config #51228,
  planned-restart, service-restart) through the same os._exit backstop. Those
  paths previously fell through to normal interpreter finalization, leaving
  them vulnerable to the SAME wedged-non-daemon-thread hang the boolean-return
  paths now avoid. main() catches SystemExit and converts its code (None->0,
  int->code, str->1) to os._exit. Every exit path is now wedge-proof.
- Document in the helper why bypassing atexit is safe (remove_pid_file +
  release_gateway_runtime_lock are performed explicitly in start_gateway
  teardown) and why logging is not flushed (synchronous RotatingFileHandlers).
- Tests: assert termination via os._exit not SystemExit (adapted from
  @AgenticSpark's PR #53122, a duplicate of #53121), plus SystemExit(78) is
  routed through os._exit(78) and SystemExit(None) maps to os._exit(0).
2026-07-01 15:51:57 +05:30
teknium1
1c350728ec chore(release): map Lazymonter into AUTHOR_MAP for PR #42914 salvage 2026-07-01 03:21:20 -07:00
HiaHia
8feeb0ccb8 fix(gateway): retry launchd bootstrap after bootout on EIO for install/start
On macOS, `launchctl bootstrap` of a label still registered in the domain
fails with 5: Input/output error (EIO). That is the *already loaded* case — a
stale registration from an interrupted restart or a bootout that didn't settle
— recoverable by booting the leftover out and bootstrapping again, and distinct
from the domain being genuinely unmanageable.

launchd_install and launchd_start (both bootstrap paths) treated exit 5 as
'launchd cannot manage this macOS version' and silently degraded to a detached
process, losing auto-start at login and crash-restart. Centralize bootstrap in
_launchctl_bootstrap(), which on EIO boots the stale label out and retries once;
only if the retry also fails does the error propagate so callers apply their
existing _launchctl_domain_unsupported fallback for a genuinely broken domain.

launchd_restart already boots out before bootstrapping (its drained job is
almost always still registered, so a plain bootstrap would hit EIO on the common
path), so it keeps its explicit pre-bootout rather than routing through the
bootstrap-first helper. Corrected the stale exit-5 comment that claimed it
always meant an unmanageable domain.

Adds TestLaunchctlBootstrapEioRetry covering clean bootstrap (no bootout),
EIO -> bootout -> retry success, persistent EIO re-raise, and non-EIO re-raise
without a spurious bootout.
2026-07-01 03:21:20 -07:00
teknium1
69f08c2eb5 fix(telegram): guard _post_connect_task access for object.__new__ test pattern
disconnect() reads self._post_connect_task, but several tests build a bare
TelegramAdapter via object.__new__() without calling __init__ (which sets the
attr). Use getattr(..., None) so disconnect() works on those instances too
(pitfall #17).
2026-07-01 03:18:57 -07:00
LeonSGP43
3362bdb4e5 fix(telegram): defer post-connect housekeeping off the connect path
Command-menu registration (set_my_commands), the status-indicator, and
DM-topic setup make Bot API calls that can stall for certain bot tokens.
They ran inside connect() before/after _mark_connected() but still within
the coroutine the gateway wraps in a connect timeout, so one slow call blew
the whole connect and the adapter never came up — even though polling/webhook
was already live (getMe works via curl). Fixes #46298.

- mark connected as soon as polling/webhook startup succeeds
- move command-menu, status-indicator, and DM-topic setup into a cancellable
  background housekeeping task (_run_post_connect_housekeeping)
- cancel that task during disconnect so it can't fire into a torn-down client
- harden scope-name lookup with getattr fallback

Salvaged onto the relocated plugin adapter (plugins/platforms/telegram/
adapter.py) since the original PR #46404 targeted the pre-migration
gateway/platforms/telegram.py path.

Co-authored-by: Hermes Agent <teknium@nousresearch.com>
2026-07-01 03:18:57 -07:00
Tranquil-Flow
122e5bc037 fix(agent): retry 413 after stripping vision payloads (#47339)
When text compression can't reduce a 413 request further, evict base64
image parts from tool messages and retry once instead of dead-ending
with 'Payload too large and cannot compress further.'

A 413 is a request-body byte-size limit, not a token limit. browser_vision
screenshots (2-5MB base64 each) keep the HTTP body oversized even after
aggressive summarization. The strip pass passes remember_model=False so a
413 does not poison _no_list_tool_content_models — that set is for providers
that reject list-type tool content, a distinct failure mode.

Cherry-picked from #47397 by Tranquil-Flow; placed onto main's current
token-aware 413 recovery else branch.
2026-07-01 03:18:41 -07:00
Teknium
2b8adb8683 chore(release): map tgmerritt author for PR #43553 salvage 2026-07-01 03:17:48 -07:00
Tyler Merritt
320c587256 fix(context): parse vLLM's token-based output-cap error format
vLLM (and other OpenAI-compatible servers) report context overflow with
both the window and the prompt in tokens:

  "This model's maximum context length is 131072 tokens. However, you
   requested 65536 output tokens and your prompt contains at least 65537
   input tokens, for a total of at least 131073 tokens."

parse_available_output_tokens_from_error() already classified this as an
output-cap error (the "requested N output tokens" gate), but none of the
extraction patterns matched the "prompt contains [at least] N input
tokens" phrasing, so it returned None. The recovery path then
misclassified the failure as prompt-too-long and looped through
compression — which frees little while each retry keeps requesting the
same oversized max_tokens — terminating in "cannot compress further"
even though simply lowering the output cap would have succeeded.

Add an extraction branch for the token-based phrasing: available output
= window - reported input. When the input alone is at or over the
window it still returns None, so the caller correctly falls through to
compression.

Relates to #43547.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 03:17:48 -07:00
annguyenNous
a1f62f4777 fix(gateway): freshness-gate resume_pending against per-message zombies
A crash-interrupted session marked resume_pending is returned by
get_or_create_session so its transcript reloads intact. The idle/daily
reset policy (#54442) keys on updated_at, which is bumped to now on every
message — so a zombie session that keeps receiving messages never trips
it and resumes stale context forever (context bleed reported on Telegram
and Feishu).

Gate the resume_pending branch on last_resume_marked_at (set once at
resume-mark, never bumped per-message) against the auto-continue freshness
window. If resume has been pending past the window, fall through to
auto-reset with reason "resume_pending_expired". A window <= 0 disables
the gate (opt-out for the pre-fix always-fresh behaviour).

Also hoist auto_continue_freshness_window() into gateway/session.py as the
single source of truth; gateway/run._auto_continue_freshness_window() now
delegates to it (keeps the existing import/patch surface).

Fixes #46934

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-07-01 03:17:20 -07:00
teknium1
ac3f4aed96 docs(cron): correct stale 'no new seed code' comments for in_channel
The in_channel surface DOES add a seed: _seed_cron_channel_session CREATES
the flat (platform, chat_id, None) session and mirrors the brief into it,
because mirror_to_session only APPENDS to an existing session and the flat
channel row is otherwise absent for a chat_postMessage delivery. Correct
the scheduler thread-skip comment and the test class docstring, which still
described the earlier 'let the existing mirror seed it' design.
2026-07-01 03:16:13 -07:00
Ben
751a300fca docs(cron): scope in_channel to channels; document DM continuation knob
Live DM testing showed a reply to a DM cron brief did NOT continue the job.
Root cause: for a 1:1 DM the governing knob is dm_top_level_threads_as_sessions
(default True), NOT reply_in_thread / cron_continuable_surface. Under the
default, each top-level DM keys to a per-message session (…:dm:<chat>:<ts>),
so a reply mints a new ts and can never converge with the flat …:dm:<chat>
session the cron seed creates.

A 1:1 DM has no thread-vs-timeline split, so "in_channel" has no coherent
meaning for a DM — cron_continuable_surface is a channel concept and is a
no-op for DMs. DM continuation is governed entirely by
dm_top_level_threads_as_sessions:
  - false → all top-level DMs share …:dm:<chat> → seed + reply converge → works
  - true (default) → per-message sessions → no continuation (cron or interactive)

Option A (chosen): document the requirement; no code change (the flat-DM seed
from the prior commit already lands correctly when the knob is false). Adds a
":::note 1:1 DMs" admonition to cron.md + the zh-Hans mirror.

Verification (real inbound handler, not a hard-coded assumption — the mistake
that made the earlier DM E2E falsely pass): tests/manual/cron_inchannel_dm_e2e.py
drives the REAL _handle_slack_message for a top-level DM under both knob values
and asserts false→converges (…:dm:D_TESTDM == seed), true→diverges
(…:dm:D_TESTDM:<ts>). See decisions.md D9.
2026-07-01 03:16:13 -07:00
Ben
2c84fb42b0 fix(cron/slack): CREATE the flat session for in_channel (mirror only appends)
Live testing exposed a real bug: an in_channel continuable cron delivered
flat to the channel () but the reply did NOT continue the job — the bot
had no brief in context and confabulated the answer.

Root cause: mirror_to_session only APPENDS to a session that already
exists (_find_session_id → no-op when none matches); it never CREATEs one.
A flat (slack, chat_id, None) row is only created when a human posts a
top-level message the bot processes — a cron chat_postMessage delivery
never goes through the inbound handler, so the row is absent and the brief
is silently dropped. The prior impl relied on the bare mirror (F5/OQ-1
concluded "deletion only" — wrong).

Fix: _seed_cron_channel_session mirrors _seed_cron_thread_session —
get_or_create_session FIRST (chat_type = "dm" if is_dm else "group",
thread_id=None), keyed to the ORIGIN USER'S id, then mirror. The channel
session key embeds user_id (…:group:<chat>:<user>), so a system:cron id
would key the seed away from the reply; the origin user's id makes seed
key == inbound reply key. DM key ignores user_id but needs chat_type=dm
to match the prefix. Wired into the in_channel branch after delivery;
suppresses the generic mirror to avoid double-write.

DM validated (per request): the seeded key equals the inbound DM reply key
for a 1:1 DM; continuation works there too.

Tests:
- Rewrote the in_channel tests to use a real _session_store and the origin
  user_id; assert get_or_create_session is called with the flat, correctly-
  keyed source. Prove-fail: (a) reverting the create step and (b) seeding
  with system:cron each turn a targeted test RED; restore → GREEN.
- +2 direct _seed_cron_channel_session unit tests asserting the KEY-MATCH
  invariant (seed key == inbound reply key) via build_session_key, for both
  channel and DM.
- Rewrote tests/manual/cron_inchannel_e2e.py to drive a REAL SessionStore +
  real mirror_to_session + real _find_session_id + real build_session_key
  (no session-layer mocks — the old mocked E2E is exactly why the bug
  shipped). Asserts the brief lands in the transcript and the reply resolves
  to the same session, for BOTH channel and 1:1 DM.

Full relevant sweep: 283 passed.
2026-07-01 03:16:13 -07:00
Ben
4b4349eb9a feat(cron/slack): flat in-channel continuable cron delivery surface
Add a per-platform `cron_continuable_surface` extra key
(`thread` default | `in_channel`) so a continuable cron job can deliver
FLAT into a Slack channel — no dedicated thread — and still be
replied-to. In `in_channel` mode the scheduler skips the thread-open
branch (leaves `thread_id=None`); the shipped origin-mirror then seeds
the `(slack, chat_id, None)` shared-channel session — the same bucket
`reply_in_thread: false` routes inbound channel replies to — so a plain
channel reply continues the job in context.

Design: specs/cron-inchannel-continuable (D1–D7, F5). Model B
(shared-channel session), NOT anchoring to the delivery `ts` — on Slack
replying to a specific message IS threading, so a `ts` anchor would only
relocate the thread, never deliver true threadless continuable.

- gateway/platforms/base.py: `supports_inchannel_continuable` capability
  flag (default False → unsupported platforms fail SAFE to `thread`).
- plugins/platforms/slack/adapter.py: flag=True; `_cron_continuable_surface()`
  resolver (coerces to the two-value enum); `_warn_if_inchannel_without_flat_reply`
  connect-time warning (D5: warn, not hard-require — the misconfig fails safe).
- gateway/config.py: shared-key bridge line (top-level OR nested config).
- cron/scheduler.py: read the key generically from platform config, gate
  the `in_channel` branch on the adapter capability flag, skip thread-open.
  No new seed function (reuses the existing mirror — G6).

Pairing (docs): `in_channel` + `reply_in_thread: false` +
`require_mention: false` (or a free-response channel). Missing
`reply_in_thread: false` fails safe to a threaded continuation.

Gateway-side config flag — `/restart` to apply; NO Slack app reinstall.

Tests (from inside the worktree, PYTHONPATH=$PWD):
- +6 cron scheduler tests (in_channel skips thread-open; seeds flat
  channel session with thread_id=None; thread-mode regression;
  fail-safe on unsupported platform; value coercion). Prove-fail:
  removing the `and not in_channel_surface` guard turns the two
  load-bearing tests RED; restore → GREEN.
- +10 slack resolver/capability/warning tests; +2 config-bridge tests.
- tests/manual/cron_inchannel_e2e.py: offline E2E driving BOTH real
  legs (delivery seed + inbound reply keying) → both converge on
  (slack, C, None).
- No regressions: test_slack.py 216 passed alone; broader sweep green
  (4 pre-existing cross-file-ordering failures reproduce identically on
  pristine origin/main).

Docs: cron.md + slack.md + zh-Hans mirrors of both.
2026-07-01 03:16:13 -07:00
kshitijk4poor
daf4f1a7a9 fix(tools): close the same session leak on the hermes_subprocess_env spawn surface (review)
Review of the #50531 salvage found the cross-session HERMES_SESSION_* leak also
survives on the non-terminal spawn helper hermes_subprocess_env (added by #56202
after #50531 was written), which does os.environ.copy() without the guard. Of
its six callers, five re-bind the session identity explicitly (slash_worker/ACP
via --session-key argv) and are safe by accident; but tui_gateway cli.exec
(server.py) spawns a fresh CLI with NO --session-key under the engaged TUI host,
so it inherits a possibly-foreign HERMES_SESSION_* from the last-writer-wins
global and would stamp Kanban rows / telemetry with another session's id.

Route hermes_subprocess_env through the same _inject_session_context_env
chokepoint, restoring the single-uniform-policy-across-every-spawn-surface
invariant the codebase already claims for the internal-secret filter. Safe for
all six callers: bound ContextVars win (re-binders unaffected), _UNSET strips
(closes cli.exec). Adds 3 guard tests; mutation-checked.
2026-07-01 15:42:19 +05:30
PolyphonyRequiem
cc395e8050 fix(gateway): close cross-session HERMES_SESSION_* leak into subprocess env
Session vars (HERMES_SESSION_*) have a process-global os.environ mirror written
last-writer-wins as a CLI/cron fallback and never cleared. Under a concurrent
multi-session host (messaging gateway, ACP adapter, API server, TUI) that global
belongs to whichever turn wrote it last. A subprocess spawned from a task whose
session ContextVar is _UNSET (a sibling task that never bound, or one that
inherited another session's context) inherited the FOREIGN global and acted on
another session's identity.

Add a session_context_engaged() latch (set once any host calls set_session_vars)
and route both terminal spawn paths through a single _inject_session_context_env
chokepoint: once engaged, a bound ContextVar (incl. "") is authoritative and an
_UNSET var is STRIPPED rather than inheriting the possibly-foreign global. Pure
single-process CLI/one-shot (never engaged) keeps the inherited fallback.

Salvaged from #50531 (supersedes #49922). local.py hunk re-applied by intent
onto the current hermes_subprocess_env refactor.

Co-authored-by: PolyphonyRequiem <3107779+PolyphonyRequiem@users.noreply.github.com>
2026-07-01 15:42:19 +05:30
kshitijk4poor
e3819a4143 test(anthropic): add adjacency behavior test for #52145 + fix vacuous refresh-UA test (review)
Review follow-up on the anthropic_adapter batch salvage:

1. #52145 shipped no behavior test for the adjacency rewrite. Add
   test_strips_tool_use_when_result_not_immediately_adjacent (a tool_use whose
   result appears later but NOT in the immediately-following user message must
   be stripped — the exact case the old global id-match got wrong) plus an
   adjacent-pair control. Mutation-checked: reverting to a global match fails
   the non-adjacent test.

2. test_token_refresh_ua_prefix was vacuous — it bound to _refresh_oauth_token
   (a wrapper with no urllib.request.Request), so its assert never ran and it
   did NOT guard the real refresh UA site. Retarget it at
   refresh_anthropic_oauth_pure (:1048) with the header-scoped check. Mutation-
   checked: reverting :1048 to claude-cli/ now fails it.
2026-07-01 15:42:15 +05:30
kshitijk4poor
5efbd7cb05 test(anthropic): scope OAuth-UA source check to header lines, not any mention
The salvaged test_token_exchange_ua_prefix did a naive whole-function substring
check for 'claude-cli/', which false-positives on an explanatory comment that
references the old (blocked) UA. Scope it to actual User-Agent header lines —
mirroring the sibling test_no_claude_cli_in_source — so a comment documenting
why claude-cli/ is avoided doesn't trip it. Mutation-checked: an actual
claude-cli/ UA header still fails the test.
2026-07-01 15:42:15 +05:30
DhivinX
49e129e495 fix(anthropic): use claude-code/ UA prefix for OAuth to avoid 404 (#48534)
Anthropic's OAuth endpoints 404 for the claude-cli/ User-Agent prefix. Switch
all three OAuth UA sites (build_anthropic_client, refresh_anthropic_oauth_pure,
run_hermes_oauth_login_pure) to the claude-code/ prefix Anthropic expects.

Salvaged from #51948.

Co-authored-by: DhivinX <20087092+DhivinX@users.noreply.github.com>
2026-07-01 15:42:15 +05:30
fsaad1984
5881791adc fix(adapter): enforce tool_use/tool_result adjacency in _strip_orphaned_tool_blocks
_strip_orphaned_tool_blocks collected tool_result ids across ALL user messages
and kept any assistant tool_use whose id appeared anywhere, rather than
requiring the result to be in the immediately-following user message. A stale
match elsewhere in the transcript could keep a genuinely-orphaned tool_use,
which Anthropic rejects. Rewrite to adjacency-checked two-pass logic so a
tool_use is kept only when its result immediately follows.

Salvaged from #52145.

Co-authored-by: fsaad1984 <38867992+fsaad1984@users.noreply.github.com>
2026-07-01 15:42:15 +05:30
kshitijk4poor
ede5c09f3b docs(disk-cleanup): clarify cron output-root protection is exact-match
Review follow-up: the _is_protected_cron_path docstring listed output/ next
to jobs.json/.tick.lock as 'the directory itself', which is slightly
ambiguous. Spell out that the match is EXACT-path only and must not be
'simplified' into a blanket cron/output/* guard (children stay cleanable) —
prevents a future editor from re-introducing the wholesale-delete bug this
fix closes.
2026-07-01 15:42:04 +05:30
martinramos002-bot
d173e8c3a7 fix: protect cron output root from cleanup
Only classify files below cron/output/ as disposable cron output.
The cron/output directory itself is a durable container for retained
job history and should not be tracked or deleted wholesale.

Add regression coverage for both category detection and cleanup of a
stale tracked entry pointing at the output root.
2026-07-01 15:42:04 +05:30
kshitijk4poor
7f71a48a3a fix(cron): release TERMINAL_CWD lock even when run_job body raises
Rework follow-up on the per-job TERMINAL_CWD readers-writer lock.

The lock was acquired BEFORE the try: whose finally: is the only release
site, with the env-override statements (os.environ[TERMINAL_CWD] = workdir;
logger.info) sitting in the unprotected window between acquire and try. Any
exception there — a raising log handler, an os.environ error, a thread
interrupt — propagated out of run_job WITHOUT running the finally, leaking
the lock. A leaked writer permanently deadlocks the whole scheduler (every
future cron job blocks on acquire_*); a leaked reader blocks all writers.

- Snapshot _prior_terminal_cwd before the acquire (so the finally can always
  restore env even if the body raises before the override).
- Open the try: immediately after acquire and move the env-override lines
  inside it, so the existing finally always releases the lock.
- Add a mutation-verified regression test: a workdir job whose in-window
  logger.info raises must still release the writer lock (a subsequent
  acquire_write must not block).
2026-07-01 15:39:48 +05:30
entropy-0x
abc349bd79 fix(cron): isolate per-job TERMINAL_CWD from concurrent cron jobs
A cron job with a per-job `workdir` overrides the process-global
`os.environ["TERMINAL_CWD"]` for the entire duration of its agent run and
restores it afterwards. The scheduler dispatches workdir jobs on a
single-thread sequential pool and workdir-less jobs on a separate parallel
pool, and the in-code comments claimed this made the override safe.

That only prevents two workdir jobs from overlapping each other. The two
pools run concurrently in the same process and share `os.environ`, so while
a workdir job has `TERMINAL_CWD` pointed at its project directory, any
workdir-less job firing in the same window reads that same global through the
terminal, file, and code-exec tools and runs its commands in the wrong
directory. The corruption window spans the whole workdir-job run, and a file
write or delete can land in another job's tree.

This serializes the override with a writer-preferring readers-writer lock.
Workdir jobs acquire it as writers (exclusive for their whole run); workdir-
less jobs acquire it as readers, so they still run in parallel with each
other but never alongside a workdir job's override. The guarantee is based on
run overlap rather than tick boundaries, so it also holds when a workdir job
spans ticks.

## What does this PR do?

Fixes a directory-isolation bug in the cron scheduler: a workdir cron job's
process-global `TERMINAL_CWD` override could be observed by a concurrently
running workdir-less cron job, causing that job's shell/file/code-exec
commands to execute in the wrong directory.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ]  New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ]  Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)

## Changes Made

- `cron/scheduler.py`: add `_ReadWriteLock` (writer-preferring) and the
  module-global `_terminal_cwd_lock`.
- `cron/scheduler.py`: in `run_job`, acquire the lock as a writer for workdir
  jobs and as a reader for workdir-less jobs, spanning the `TERMINAL_CWD`
  override and its restore in the `finally` block.
- `cron/scheduler.py`: correct the stale comments in `run_job` and `tick` that
  claimed the sequential pool alone made the override safe.
- `tests/cron/test_terminal_cwd_lock.py`: new tests for reader concurrency,
  writer exclusion, and the no-cross-observation regression.

## How to Test

1. `python -m pytest tests/cron/test_terminal_cwd_lock.py -q` — the regression
   test `test_reader_never_observes_writer_override` fails without the lock and
   passes with it.
2. `python -m pytest tests/cron/test_cron_workdir.py tests/cron/test_parallel_pool.py -q`
   — confirms the existing `TERMINAL_CWD` set/restore and pool behaviour are
   unchanged.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix
- [x] I've run the affected `tests/cron/` suites and all tests pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (docstrings/comments) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture — N/A
- [x] I've considered cross-platform impact (Windows, macOS) — uses stdlib `threading` only
- [x] I've updated tool descriptions/schemas if I changed tool behavior — N/A
2026-07-01 15:39:48 +05:30
srojk34
db0fd8f290 fix(security): use caller package root for deregister opt-in policy lookup
_plugin_override_policy is keyed by the plugin package root
(e.g. hermes_plugins.allowed), but the lookup used caller_mod
(the exact leaf module string). A call from hermes_plugins.allowed.cleanup
would evaluate _plugin_override_policy.get("hermes_plugins.allowed.cleanup")
→ False and raise PermissionError even when the plugin registered opt-in
under its package root.

Switch the policy lookup to caller_root (.join of the first two segments)
so submodule callers inherit the package-level allow_tool_override grant.

Adds a focused regression test for the opted-in submodule case.
2026-07-01 15:37:58 +05:30
testingbuddies24
e07768a53f fix(gateway): strip orphan think-tag close tags in progressive stream
When a model emits an inline <think>...</think> block but the opening
tag is dropped upstream (thinking-mode toggle, truncated stream, or
incomplete upstream filtering), the bare </think> close tag leaked
through to the user in the live progressive edit. The agent-side final
scrubber (agent/think_scrubber.py) already had _strip_orphan_close_tags;
this ports the same logic into GatewayStreamConsumer so the streaming
display stays clean too.

- _filter_and_accumulate: strip orphan close tags before appending the
  'no-opening-tag' branch text to _accumulated.
- _flush_think_buffer: same on stream end for held-back partials.
- 14 regression tests (TestStripOrphanCloseTags): all 6 close-tag
  variants, multi-tag, partial-tag-untouched, trailing whitespace,
  and end-to-end through _filter_and_accumulate / _flush_think_buffer.

Only strips KNOWN close-tag names (case-insensitive) — never arbitrary
tag-shaped substrings — so comparison operators and unrelated prose are
preserved.

Salvaged from PR #43192 by @testingbuddies24.
2026-07-01 03:04:01 -07:00
amathxbt
6a6fd42111 fix(security): block subshell/brace-group wrappers at the hardline floor
Wrapping a catastrophic command in a bare subshell or brace group walked
straight past the unconditional hardline floor -- even under --yolo,
/yolo, approvals.mode=off, and cron approve mode. The command-substitution
forms were already caught; the bare paren / brace-group forms were the gap.

Rather than add the paren and brace openers to the flat _CMDPOS pattern
class (which cannot tell a real subshell opener from one sitting inside a
quoted argument, and would false-positive on ordinary prose such as a PR
title that merely mentions the trigger word), teach the existing
QUOTE-AWARE command-start tokenizer (_iter_shell_command_starts) to treat
the paren and brace openers as command starts, then emit a detection
variant that marks each real command start with a newline (already a
_CMDPOS separator). Openers inside quotes never register as starts, so
quoted arguments are left untouched while real subshell/brace bypasses now
anchor. One place covers every _CMDPOS rule (shutdown/reboot/init/
systemctl/telinit and the rm root/home/system floor).

Tests: subshell/brace bypasses added to the hardline-block, root-wipe, and
yolo-bypass sets; a regression set asserts quoted paren/brace prose is NOT
blocked (guards our own gh-pr-create workflow).
2026-07-01 03:03:05 -07:00
teknium1
6d1291f2cc chore(deps): bump aiohttp to patched 3.14.1 (from 3.14.0)
3.14.1 is the current patched release on the 3.14 line; both CVE-2026-34993
(CookieJar.load RCE) and CVE-2026-47265 (per-request cookie leak on
cross-origin redirect) are fixed as of 3.14.0, and 3.14.1 rolls up the
subsequent point fixes. Re-locked uv.lock.
2026-07-01 02:51:45 -07:00
Wing Huang
6c37b2c785 security(deps): enforce aiohttp CVE floor on all lazy messaging paths + coverage guard
The messaging extra and platform.slack pin aiohttp==3.14.0, but several
lazy messaging features listed only their SDK and let aiohttp come in
transitively. Each of those SDKs caps aiohttp loosely enough that a
vulnerable already-installed aiohttp still satisfies the range, so the
eager extras got the patched floor while the lazy paths did not:

  - discord.py (aiohttp>=3.7.4,<4)
  - mautrix / aiohttp-socks (aiohttp>=3,<4 / aiohttp>=3.10.0)  [Matrix]
  - microsoft-teams-apps (aiohttp<4)                            [Teams]

(Teams additionally shipped an explicit but *stale* aiohttp==3.13.4 in
both the pyproject `teams` extra and platform.teams.)

- tools/lazy_deps.py: add aiohttp==3.14.0 to platform.discord, platform.matrix;
  bump the stale platform.teams pin 3.13.4 -> 3.14.0.
- pyproject.toml: add aiohttp==3.14.0 to the matrix extra; bump the teams extra
  3.13.4 -> 3.14.0 (homeassistant/sms/messaging already at 3.14.0).
- tests/test_packaging_metadata.py: test_security_pins_present_in_mirrored_lazy_features
  now covers platform.discord/slack/matrix/teams. The existing agree-guard only
  compares packages pinned in BOTH sources, so it can't catch a lazy feature
  that omits a pin entirely; this guard is an explicit coverage contract
  (security package -> lazy features that must carry it) and fails with
  'platform.matrix: aiohttp=MISSING' if a floor is dropped again.
- uv.lock: regenerated, zero drift (aiohttp 3.14.0).
2026-07-01 02:51:45 -07:00
Wing Huang
828f33e6b1 fix(ci): map contributor email for attribution check
scripts/release.py AUTHOR_MAP is greped by the Contributor Attribution
Check to resolve a commit author's email -> GitHub username. Add
huangsen365@gmail.com -> huangsen365 so this PR's commits pass the check.

(This commit originally also carried a gateway race-test flake fix; that
edit is now dropped because main independently hardened the same test with
a superior server._sessions snapshot/restore isolation, making ours
redundant.)
2026-07-01 02:51:45 -07:00
Wing Huang
6f956d7405 test(deps): guard pyproject<->lazy_deps pin consistency
Adds two checks to tests/test_packaging_metadata.py:

1. No package is exact-pinned to two different versions across
   pyproject.toml's [project.dependencies] / extras.
2. Every package pinned in BOTH the pyproject extras and the LAZY_DEPS
   allowlist in tools/lazy_deps.py uses the same version.

This is the regression guard for the drift the rest of this PR fixes: the
two pin sources are hand-maintained mirrors (lazy_deps even documents
"update both this map AND the corresponding extra"), and they have silently
diverged on aiohttp and anthropic. Run against the pre-fix tree, check (2)
fails on `anthropic: pyproject=['0.86.0'] lazy_deps=['0.87.0']`.

The lazy_deps side is parsed via AST (not imported) so the test stays free
of tools/lazy_deps.py runtime imports; only exact `==` pins are compared.
2026-07-01 02:51:45 -07:00
Wing Huang
db57cbbaf6 security(deps): bump aiohttp to 3.14.0, anthropic to 0.87.0; pin cryptography floor
- aiohttp 3.13.4 -> 3.14.0 (messaging/slack/homeassistant/sms extras +
  lazy_deps platform.slack) — picks up CVE-2026-34993 (RCE via
  CookieJar.load deserialization) and CVE-2026-47265 (per-request cookie
  leak on cross-origin redirect). Both are fixed only in 3.14.0; there is
  no 3.13.x backport.
- anthropic 0.86.0 -> 0.87.0 (anthropic extra) — CVE-2026-34450 /
  CVE-2026-34452. lazy_deps provider.anthropic was already 0.87.0; the
  extra pin had drifted back to the vulnerable 0.86.0, so this realigns it.
- cryptography pinned explicitly at 46.0.7 in core deps — CVE-2026-39892,
  CVE-2026-34073. It only arrives transitively via PyJWT[crypto]; the
  explicit floor keeps the WeCom/Weixin crypto paths from drifting below
  the fix.

uv.lock regenerated; only aiohttp / anthropic moved (cryptography already
resolved to 46.0.7). Verified 3.14.0 satisfies discord.py 2.7.1
(aiohttp>=3.7.4,<4) and slack-sdk 3.40.1 (aiohttp>=3.7.3,<4).
2026-07-01 02:51:45 -07:00