When a Markdown ordered list has blank lines between items (common in
LLM-authored content), the list run loop breaks on each blank line.
Slack numbers each rich_text_list independently, so N items produce N
lists each starting at 1.
Skip blank lines inside the list run as soft separators instead of
breaking, so ordered items stay in one rich_text_list and Slack renders
the correct numbering.
Fixes#57076
Salvage of #2794 by @CharmingGroot, ported to the relocated
plugins/platforms/email/adapter.py:
- Guard raw_email = msg_data[0][1] against IndexError/TypeError and
non-bytes payloads. UIDs are added to _seen_uids before fetch, so an
exception mid-batch permanently skipped every remaining message in
the batch — now the bad message is logged and skipped instead.
- Message-ID domain generation falls back to 'localhost' when
EMAIL_ADDRESS lacks '@' (now via a shared _message_id_domain() helper
covering all 3 send paths; the PR fixed 2 of 3).
Two Hermes bots sharing a channel could volley replies at each other
indefinitely. Root cause: Discord reply-pings (allowed_mentions
replied_user=true) add the replied-to bot to message.mentions without a
literal <@bot> token in the body, so the existing bot-admission gate
treated a reply chip as an explicit @mention and re-triggered the peer.
Adds opt-in discord.bots_require_inline_mention (default false; env
DISCORD_BOTS_REQUIRE_INLINE_MENTION). When enabled, bot-authored
messages must carry a raw inline <@id>/<@!id> mention in the content;
reply-ping-only mentions no longer admit the message. Human messages and
all existing defaults are unchanged.
The new _self_is_raw_mentioned helper deliberately ignores the resolved
message.mentions list (which reply-ping populates) and checks only the
raw content token via the shared _raw_mentioned_user_ids primitive.
Wrap each Telegram initialize() attempt in asyncio.wait_for(HERMES_TELEGRAM_INIT_TIMEOUT,
default 30s). When api.telegram.org and all fallback IPs are unreachable, the connect
chain has no outer bound, so a single initialize() blocks for minutes and the
retry-on-exception loop never fires — the gateway appears to hang after the banner.
The timeout guarantees each attempt is bounded, then retries with backoff, then fails
with an actionable error. Also adds WARNING-level progress logs before DoH discovery
and each connect attempt (visible at default log level).
Salvaged onto plugins/platforms/telegram/adapter.py (Telegram moved from
gateway/platforms/ since the PR was opened). Adds env var to docs + AUTHOR_MAP.
Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
Register the Matrix room-message, reaction, and invite handlers with
mautrix's wait_sync=True. mautrix's handle_sync() only returns the tasks
for handlers registered as sync-awaited; non-waited handlers are
fire-and-forget via background_task.create() and are NOT returned. Since
_dispatch_sync() awaits only the returned tasks (await asyncio.gather),
the inbound handlers previously had no completion point, so Tuwunel/
mautrix homeservers connected and completed initial sync but dispatched
zero inbound messages.
Fixes#46142.
Co-authored-by: Zeheng Huang <153708448+hunjaiboy@users.noreply.github.com>
Per review feedback on #53997 from @teknium1: the flag was set True
on failed device_id resolution but never reset, so a same-adapter
reconnect that successfully resolves a real device_id would keep
skipping server-side key verification indefinitely.
Reset now happens at the top of connect(), before resolution runs,
so every connect() attempt starts clean. A repeat failure re-sets
the flag (unchanged behavior); a recovery correctly clears it.
Adds TestDeviceIdRecoveryOnReconnect to cover the transition.
- Resolve device_id via query_keys({mxid: []}) when whoami() returns None
- Guard _verify_device_keys_on_server and _reverify_keys_after_upload
against None/unverified device_id to prevent 'device_keys values must
be a list of strings' serialization failure
- Disconnect existing client before reconnect to prevent dual OlmMachine
instances on the same crypto store
Re-targeted from #39779 (legacy gateway/platforms/matrix.py) onto the
migrated plugins/platforms/matrix/adapter.py path following the
2026-06-20 adapter migration. Logic unchanged from original fix.
242 tests passing (233 upstream + 9 new).
Slack Workflow Builder posts (and other app/bot messages) arrive as
subtype=bot_message with user=None. _is_user_authorized rejected them at
the `if not user_id: return False` guard, which runs *before* the #4466
{PLATFORM}_ALLOW_BOTS bypass — so @mentioning the bot from a Slack
workflow silently did nothing, even with SLACK_ALLOW_BOTS (or
SLACK_ALLOW_ALL_USERS) set. The chat-scoped allowlist for Telegram/QQ
already runs before that guard for the same reason (channel broadcasts
with no from_user); Slack was both missing from the bot-bypass map and
had the bypass running too late.
- gateway/authz_mixin: move the {PLATFORM}_ALLOW_BOTS bypass ahead of the
no-user-id guard and add Platform.SLACK -> SLACK_ALLOW_BOTS.
- plugins/platforms/slack/adapter: set is_bot=True on inbound
bot_message events so the gateway can identify workflow/app senders
(they carry no user_id to match against the allowlist).
Tested: new tests/gateway/test_slack_bot_auth_bypass.py plus the existing
Discord/Feishu bot-auth and gateway authz/gating suites all pass.
Aligns runtime behaviour with SECURITY.md 2.6: externally reachable
messaging adapters must fail closed unless access is explicitly
configured. Closes the confirmed multiplex authorization bypass a
secondary profile's open dm/group policy no longer inherits the default
profile's allowlist trust.
- Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default
dm_policy/group_policy to pairing/allowlist instead of open; open now
requires an explicit GATEWAY_ALLOW_ALL_USERS or per-platform allow-all.
- Startup guard (_own_policy_open_startup_violation) refuses to boot when
an enabled adapter is open without the allow-all opt-in; the guard now
runs for every secondary profile in multiplex mode too.
- Profile-aware own-policy authorization: _authorization_adapter /
_adapter_for_source resolve the live adapter via SessionSource.profile,
so _is_user_authorized and the ingress/pairing/busy/queue paths read the
originating profile's adapter policy, not the default profile's.
- Fail-closed intake for Email, Feishu P2P, and Discord (blank-principal
denial, empty-allowlist deny, missing-interaction.user deny).
Salvaged from #44073 (external-surface hardening), split into a focused
gateway-authz PR per maintainer request. Follow-up fix by Hermes Agent:
the Discord slash-auth channel bypass now matches DISCORD_ALLOWED_CHANNELS
by the same name-inclusive keys (id + name + #name + parent) the on_message
scope gate uses, so a name-form channel allowlist authorizes slash
interactions consistently (was id-only, breaking #name matching).
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Discord's _fetch_channel_context backfills recent channel/thread activity
(from any member who can post there, not just the allowlisted user) into
the agent's context with no sender-trust distinction. Slack's equivalent
_fetch_thread_context was fixed to prefix non-allowlisted senders with
[unverified] and add LLM guidance not to act on their content, mitigating
indirect prompt injection from third parties in shared channels/threads.
Port the same mechanism to Discord using the already-wired
_is_sender_authorized/set_authorization_check plumbing.
A transient Bot API network error during gateway bootstrap (deleteWebhook
or the initial start_polling) currently raises out of connect() and marks
the Telegram adapter fatal, restart-looping the whole gateway even though
the right behavior is to degrade the Telegram channel and let the existing
reconnect ladder recover in the background.
- _delete_webhook_best_effort(): swallow only transient network errors and
continue to polling; non-network errors (e.g. auth failures) still raise.
- _start_polling_resilient(): on a transient conflict/network error at
bootstrap, schedule background recovery and return degraded instead of
raising; non-transient errors still propagate.
- Track the polling error-callback recovery tasks in _background_tasks so
they can't be garbage-collected mid-flight.
- Add a second Telegram Bot API seed fallback IP (149.154.166.110).
Reconnect keeps its existing 10-retry -> supervisor-restart semantics; this
change only fixes the bootstrap raise, it does not alter the retry ladder.
disconnect() reads self._post_connect_task, but several tests build a bare
TelegramAdapter via object.__new__() without calling __init__ (which sets the
attr). Use getattr(..., None) so disconnect() works on those instances too
(pitfall #17).
Command-menu registration (set_my_commands), the status-indicator, and
DM-topic setup make Bot API calls that can stall for certain bot tokens.
They ran inside connect() before/after _mark_connected() but still within
the coroutine the gateway wraps in a connect timeout, so one slow call blew
the whole connect and the adapter never came up — even though polling/webhook
was already live (getMe works via curl). Fixes#46298.
- mark connected as soon as polling/webhook startup succeeds
- move command-menu, status-indicator, and DM-topic setup into a cancellable
background housekeeping task (_run_post_connect_housekeeping)
- cancel that task during disconnect so it can't fire into a torn-down client
- harden scope-name lookup with getattr fallback
Salvaged onto the relocated plugin adapter (plugins/platforms/telegram/
adapter.py) since the original PR #46404 targeted the pre-migration
gateway/platforms/telegram.py path.
Co-authored-by: Hermes Agent <teknium@nousresearch.com>
Add a per-platform `cron_continuable_surface` extra key
(`thread` default | `in_channel`) so a continuable cron job can deliver
FLAT into a Slack channel — no dedicated thread — and still be
replied-to. In `in_channel` mode the scheduler skips the thread-open
branch (leaves `thread_id=None`); the shipped origin-mirror then seeds
the `(slack, chat_id, None)` shared-channel session — the same bucket
`reply_in_thread: false` routes inbound channel replies to — so a plain
channel reply continues the job in context.
Design: specs/cron-inchannel-continuable (D1–D7, F5). Model B
(shared-channel session), NOT anchoring to the delivery `ts` — on Slack
replying to a specific message IS threading, so a `ts` anchor would only
relocate the thread, never deliver true threadless continuable.
- gateway/platforms/base.py: `supports_inchannel_continuable` capability
flag (default False → unsupported platforms fail SAFE to `thread`).
- plugins/platforms/slack/adapter.py: flag=True; `_cron_continuable_surface()`
resolver (coerces to the two-value enum); `_warn_if_inchannel_without_flat_reply`
connect-time warning (D5: warn, not hard-require — the misconfig fails safe).
- gateway/config.py: shared-key bridge line (top-level OR nested config).
- cron/scheduler.py: read the key generically from platform config, gate
the `in_channel` branch on the adapter capability flag, skip thread-open.
No new seed function (reuses the existing mirror — G6).
Pairing (docs): `in_channel` + `reply_in_thread: false` +
`require_mention: false` (or a free-response channel). Missing
`reply_in_thread: false` fails safe to a threaded continuation.
Gateway-side config flag — `/restart` to apply; NO Slack app reinstall.
Tests (from inside the worktree, PYTHONPATH=$PWD):
- +6 cron scheduler tests (in_channel skips thread-open; seeds flat
channel session with thread_id=None; thread-mode regression;
fail-safe on unsupported platform; value coercion). Prove-fail:
removing the `and not in_channel_surface` guard turns the two
load-bearing tests RED; restore → GREEN.
- +10 slack resolver/capability/warning tests; +2 config-bridge tests.
- tests/manual/cron_inchannel_e2e.py: offline E2E driving BOTH real
legs (delivery seed + inbound reply keying) → both converge on
(slack, C, None).
- No regressions: test_slack.py 216 passed alone; broader sweep green
(4 pre-existing cross-file-ordering failures reproduce identically on
pristine origin/main).
Docs: cron.md + slack.md + zh-Hans mirrors of both.
Add an interactive Raft setup flow for hermes gateway setup. The wizard follows the existing platform adapter setup pattern, persists RAFT_PROFILE to the Hermes env file, preserves an existing profile when the user declines reconfiguration, and registers the flow via setup_fn.
Add focused Raft adapter coverage for saving RAFT_PROFILE, keeping an existing profile, and registering setup_fn.
Signed-off-by: skyzh <skyzh@mail.build>
Signed-off-by: HaoHao <HaoHao@mail.build>
Matrix outbound image downloads validated only the final URL after
following redirects, so a public URL that 302-redirects to loopback /
private-network / cloud-metadata endpoints had already connected to the
unsafe hop before the check ran.
Re-validate every redirect hop before following it:
- aiohttp path resolves redirects manually with allow_redirects=False,
validating each Location via is_safe_url (aiohttp can't use the httpx
response event hook).
- httpx fallback installs the shared _ssrf_redirect_guard event hook.
Regression tests cover per-hop blocking of an unsafe redirect, following
a safe redirect chain, and httpx guard wiring.
An invite to a room with no remaining members surfaces as "no servers
in the room have been provided" or "room not found" on join. The pending
invite was never cleared, so every gateway startup re-attempted the join
and re-emitted the warning indefinitely.
Detect that specific failure mode by narrow error-message match and call
leave_room to decline the invite; transient/network errors leave the
invite untouched for the next sync. Adds 5 tests.
Reimplements the matrix portion of #33953 onto the current plugin adapter
(gateway/platforms/matrix.py was relocated to
plugins/platforms/matrix/adapter.py since the PR was opened). The two
gateway/status.py fixes from that PR (wrapper-subcommand rejection,
psutil start-time fallback) already landed on main independently.
Reported by @Bougey; original patch authored by @KiraKatana.
The network-error reconnect ladder (#55992) captured a stable self._app
local across its awaits and failed fast when the adapter was torn down
mid-sleep. The 409-conflict retry path had the identical unguarded
self._app.updater.start_polling() deref — a concurrent disconnect()
during its RETRY_DELAY sleep would raise the same 'NoneType' object has
no attribute 'updater' and, on a non-final retry, land in limbo. Apply
the same stable-local + fail-fast pattern so the existing except block
reschedules or escalates to fatal.
_handle_polling_network_error's chained retry never updated
self._polling_error_task, so the reentrancy guard shared with the
heartbeat loop and the pending-updates probe went stale mid-recovery,
letting more than one recovery attempt run concurrently against the
same adapter. Combined with a TOCTOU window in
_handle_adapter_fatal_error (the adapter was only removed from
self.adapters in a finally block after awaiting disconnect()), two
concurrent fatal notifications for the same adapter could both pass
the "still installed" check and call disconnect() twice, which is
where the reported "'NoneType' object has no attribute 'updater'"
originates once self._app is cleared by the first call.
- Reassign the chained retry task to self._polling_error_task so the
guard reflects an in-flight recovery.
- Capture self._app in a local variable across the stop/start_polling
sequence instead of re-reading self._app between awaits.
- Claim (pop) the adapter from self.adapters before awaiting
disconnect() in _handle_adapter_fatal_error, not after, closing the
TOCTOU window for a concurrent notification on the same adapter.
The forum-topic skill-binding lookup assumed config.extra['group_topics']
was always a list of {chat_id, topics} entries. When an operator writes the
natural mapping shape ({"-100...": [...]}), iterating yields string keys and
chat_entry.get(...) raises AttributeError, breaking dispatch for that group.
Normalize both shapes to a common iterator and guard non-dict/non-list
entries so malformed config falls through cleanly instead of crashing.
Inside httpx AsyncClient response event hooks, response.next_request is
often None even for a genuine redirect, so guards keyed on
`if response.is_redirect and response.next_request` silently never fire.
A public URL that 302s to http://169.254.169.254/ was followed anyway,
defeating the pre-flight is_safe_url() check.
Resolve the redirect target from the Location header (via urljoin, so
relative Locations work too), falling back to next_request only when no
Location is present. Extracted as tools.url_safety.redirect_target_from_response
and wired into every SSRF redirect guard:
- gateway/platforms/base.py (shared image + audio download for all platforms)
- tools/vision_tools.py (two download hooks)
- plugins/platforms/slack/adapter.py
Original fix by @zapabob (PR #35940), which targeted the since-refactored
gateway/platforms/slack.py; reconstructed onto the current shared sites and
widened to the whole bug class.
When discord.auto_thread is enabled and a top-level server-channel message
should be routed to a new thread, a transient thread-create failure (e.g.
Cannot connect to host discord.com:443) returned None and _handle_message
fell through to an inline parent-channel reply — dumping a new task into a
shared channel and breaking thread-first workflows.
- _auto_create_thread retries the primary + seed-message paths once after a
750ms backoff for transient connect errors.
- _handle_message treats None as a hard failure: posts a short visible notice
in the parent channel and returns without invoking the agent. The notify
send is wrapped so a secondary connect error can't raise.
Fixes#20243
Group gating (_should_process_message) read the raw message_thread_id,
while event routing (_build_message_event) normalized it. A plain
non-forum group reply's message_thread_id is a reply-UI anchor, not a
topic, so an anchor id matching an ignored_threads entry wrongly
dropped the message, and the anchor was treated as a routable topic
under allowed_topics.
Extract _effective_message_thread_id and route both gating and
event-building through it, so gating and session routing agree on one
normalized value: real topic/forum messages keep their thread id, reply
anchors are dropped, and forum General-topic messages normalize to the
General-topic id.
The renderer now emits native Block Kit table blocks; the module and
_rich_blocks_enabled docstrings still described the earlier monospace-only
approach.
Replace the interim monospace table fallback with Slack's native `table`
block (rows of rich_text cells). Addresses the core ask in #18918.
- _table_block(): builds type:"table" with rich_text cells, so inline
formatting (bold, links, code) renders inside cells.
- Column alignment parsed from the markdown separator row (:---, :-:, --:)
into column_settings (left = default/null-skip, center/right emitted).
- Escaped pipes (\\|) are not treated as column separators.
- Respects Slack's table limits (100 rows / 20 cols / 10k aggregate chars);
oversized or unparseable tables gracefully fall back to aligned monospace
(rich_text_preformatted), so a big table never breaks the message.
Docs (EN + zh-Hans) updated to describe native tables + the fallback.
Tests: native table shape, alignment->column_settings, inline-formatted
cells, oversized/too-wide monospace fallback, escaped-pipe cell. Prove-
failed against a stubbed _table_block (native-table tests fail, fallback
tests stay green). All existing Slack tests still pass.
Add platforms.slack.extra.rich_blocks (default off). When enabled, the
final agent message is sent as Slack Block Kit blocks — section headers,
dividers, and true nested lists via rich_text — instead of flat mrkdwn.
- New plugins/platforms/slack/block_kit.py: pure markdown->blocks renderer
(headers, dividers, nested ordered/bullet lists, blockquotes, fenced code;
pipe-tables as aligned monospace since Block Kit has no robust table block).
Enforces Slack's 50-block / 3000-char section limits and returns None to
fall back to plain text on empty/oversized/unexpected input. Never raises.
- adapter.send(): render blocks on the single-chunk primary message; a
text= fallback is ALWAYS sent alongside (notifications/accessibility).
- adapter.edit_message(): blocks only on finalize=True, so intermediate
streaming edits stay plain mrkdwn (no per-flush block re-derivation).
- Docs (EN + zh-Hans) + config example. Send-side only: no app reinstall.
Tests: pure-renderer unit suite + adapter integration suite (blocks present
when on, plain text when off, text fallback always set, finalize gating,
multi-chunk fallback). Prove-failed against a stubbed renderer.
Mitigates indirect prompt injection (CWE-863) in Slack thread context.
When the bot is mentioned mid-thread for the first time, _fetch_thread_context
pulls the full thread via conversations.replies and prepends every reply to
the LLM prompt. Replies from senders not on the allowlist were rendered
identically to authorised senders, letting a third party in a shared channel
inject instructions the model might act on when answering the next authorised
message.
- BasePlatformAdapter.set_authorization_check / _is_sender_authorized, registered
by GatewayRunner._make_adapter_auth_check() with a closure over the existing
_is_user_authorized chain (platform/global/group allowlists, allow-all flags,
pairing store all stay the single source of truth — no env-var re-parsing).
- Tags non-bot thread messages whose sender fails the auth check with an
[unverified] prefix; strengthens the header with soft guidance only when at
least one unverified message is present, so setups without an allowlist see
no behaviour change.
- Wired into all three adapter-init sites in run.py (start, reconnect watcher,
restart) so the reconnect path is covered too.
Softened wording: adapted from the original [untrusted] tag to [unverified]
and non-accusatory header framing — the label reflects allowlist status, not
a judgment about the person. Adapter relocated to plugins/platforms/slack/
since the PR was authored.
Salvaged from #17059.
Buffered text/photo/media-group flushes and the polling-error recovery
task sit behind an asyncio.sleep(). On disconnect they kept running and
dispatched handle_message() into a torn-down session, producing stale or
duplicate deliveries. disconnect() only cancelled media-group and photo
batch tasks — text batches and the polling-error task leaked.
Set a _drop_delayed_deliveries flag from _mark_disconnected/_set_fatal_error
(cleared by _mark_connected) and check it in all enqueue+flush paths so a
flush that wins the race against teardown drops instead of dispatching.
_cancel_pending_delivery_tasks() now cancels+clears all four task maps,
skipping the current task. Media-group flush finally-block guarded so a
cancelled stale flush cannot erase a replacement task handle.
Two independent fixes salvaged from #12811 (closing it; one of its three
bundled fixes — Discord free_response — is already on main).
Anthropic max_tokens (#12790): the chat-completions max_tokens fallback only
fired for OpenRouter/Nous URLs, so any other proxy serving a Claude model
(AWS Bedrock, NVIDIA, LiteLLM, vLLM, corporate gateways) shipped requests
with no max_tokens and inherited the proxy's low default (Bedrock: 4096),
exhausting on thinking + large tool calls. Changed the gate in
chat_completion_helpers.build_api_kwargs from URL-gated to model-gated:
fires whenever the model matches an _ANTHROPIC_OUTPUT_LIMITS key. This also
fixes a latent miss — the old 'claude' substring gate skipped MiniMax and
Qwen3 even on OpenRouter. Remains a last-resort fallback (build_kwargs only
applies it after ephemeral/user/profile max_tokens), so it never overrides
an explicit value, and only touches the chat-completions transport (native
Anthropic Messages API is a separate path).
Feishu channel_prompt (#12805): the Feishu adapter never resolved
channel_prompts config, unlike Discord/Slack, so per-channel role prompts
were silently ignored. Added _resolve_channel_prompt() (delegating to the
shared gateway.platforms.base.resolve_channel_prompt) and wired it into all
three MessageEvent construction sites — inbound message, reaction routing,
and card-action routing.
Tests: tests/gateway/test_feishu_channel_prompts.py (6 cases) covering exact
match, parent-thread fallback, no-match, missing-config safety, and event
propagation.
Some legitimate @bot pings were dropped because the mention gates relied on
message.mentions alone, which does not always populate raw <@ID> / <@!ID>
forms (mobile, edited, relayed messages). A bare @bot with no other text
could also spawn a fake empty-text turn.
- add _self_is_explicitly_mentioned() / _raw_mentioned_user_ids() helpers that
treat the bot as mentioned via resolved mentions OR raw content forms
- use them at the allow_bots=mentions gate, multi-agent bot filtering, the
mention-strip/mention_prefix step, and the require_mention gate
- drop bare mention-only pings (no text, no media, no injection, no backfill
context) instead of injecting a placeholder empty turn
Co-authored-by: Teknium <teknium1@gmail.com>
The polling heartbeat's pending-update probe treated a stopped updater
(running=False) as "someone else's job" and silently reset its counter,
so a long-poll task that disappears with no reconnect in flight was never
recovered. get_me() on the general request path stays healthy, so neither
PTB's error_callback nor the connectivity probe ever fires — the gateway
keeps running but stops receiving messages indefinitely (#55769).
Detect the stopped-updater case directly in _probe_pending_updates and feed
it into the existing _handle_polling_network_error ladder, debounced over two
consecutive probes so a just-starting updater or the brief stop()->start_polling()
window of an in-flight reconnect never trips it.
The inbound-media validator _is_allowed_bridge_path() checked against
IMAGE_CACHE_DIR / AUDIO_CACHE_DIR / VIDEO_CACHE_DIR / DOCUMENT_CACHE_DIR
value-imported at module load. After the base.py cache-dir getters became
per-call resolvers, the bridge writes media into the active profile's cache
while the validator still matched the frozen launch-profile constants — so
media was rejected under a profile override (multi-profile gateway).
Resolve the cache roots per-call via the get_*_cache_dir() getters and drop
the now-unused frozen value-imports. Caught by automated review on #55867.
DiscordAdapter.edit_message clipped any formatted payload over the 2,000-char
cap to [:1997]+"..." and returned success=True, so the stream consumer
believed the full reply landed and stopped — the user lost everything past the
boundary and perceived the agent as quitting mid-task.
edit_message is now overflow-aware, mirroring Telegram's proven contract:
- finalize=True: split-and-deliver via _edit_overflow_split — edit chunk 1 in
place, send chunks 2..N as reply-threaded continuations, return the last
visible id in message_id plus continuation_message_ids so the stream
consumer keeps editing the most recent chunk and can clean them all up.
- finalize=False (mid-stream): truncate a one-message preview in place, never
split. A mid-stream split moves the edit target to a continuation and the
next accumulated-token tick re-splits, looping forever (the Telegram #48648
lesson the original port predated).
- Reactive 50035 '2000 or fewer in length' on edit runs the same branch logic.
- Partial continuation failure still reports success with a partial_overflow
raw_response so the consumer retries the tail instead of marking a clipped
reply complete.
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: AhmetArif0 <147827411+AhmetArif0@users.noreply.github.com>
When WHATSAPP_FORWARD_OWNER_MESSAGES is enabled and the bridge marks an
inbound message with fromOwner=true, also prefix MessageEvent.text with
"[owner reply] " at construction time. This makes the disambiguation
survive any downstream plugin failure (e.g. handover-rule errors that
bypass silent_ingest), so transcripts never misattribute owner-typed
text to the customer.
Idempotent: re-applies are guarded so a future producer that pre-tags
text won't be double-prefixed.
In `WHATSAPP_MODE=bot` the bridge currently drops every fromMe inbound
message — they are all assumed to be echoes of our own /send calls.
That makes it impossible for plugins / agents to detect when a human
owner has typed directly into a customer chat from the same WhatsApp
Business account (e.g. via a linked phone or WhatsApp Web).
This adds an opt-in `WHATSAPP_FORWARD_OWNER_MESSAGES` env var. When
true, the bridge classifies fromMe inbound by looking up `key.id` in a
bounded LRU of recently-sent message IDs (the existing 50-entry echo
suppressor, bumped to 512 and extracted to a testable
`outbound_ids.js` helper). Hits in the LRU are still dropped (echoes);
misses are forwarded to the Python adapter with `fromOwner: true`.
The Python adapter lifts that flag onto
`MessageEvent.metadata["whatsapp_from_owner"]`. `metadata` is a new
free-form dict on the event so future per-platform signals don't each
need their own field. Default behaviour is unchanged: with the env
flag unset, bot mode still drops every fromMe message exactly as
before.
Use cases for downstream consumers:
- Implicit handover activation when the owner replies manually
- Sliding TTL on owner activity (keep an active session alive while
the owner is engaged)
- Audit trails of owner interventions
- Analytics on human-vs-bot reply ratios
Heuristic limitation (documented in code): the LRU is in-memory. After
a bridge restart, in-flight delivery receipts of pre-restart sends will
briefly look like owner-typed for a few seconds until the set is
repopulated. Persisting isn't worth the disk churn — downstream
consumers should treat the flag as best-effort.
Tests:
- tests/gateway/test_whatsapp_from_owner.py (new): adapter sets the
metadata flag iff the bridge payload has `fromOwner: true`; absent
otherwise.
- scripts/whatsapp-bridge/outbound_ids.test.mjs (new): LRU bounds,
eviction order, falsy-id handling.
Backwards compatibility: with the env flag unset, every code path is
identical to before. No existing deployment is affected.
The Matrix adapter's _upload_file fell back to sending
"(file not found: {file_path})" directly into the room — the same
host-path leak class fixed for the base adapter and Slack in the
previous commit. Replace it with a friendly notice, log the path at
WARN for operators, and preserve any caller-supplied caption.
The base BasePlatformAdapter implementations of send_voice, send_video,
send_document, and send_image_file forwarded their *_path argument
verbatim into the chat text (e.g. "🎬 Video: /home/.../hermes/cache/...").
Telegram, Discord, and Slack adapters all fall back to those base methods
when their native send raises — so a rejected video on Telegram surfaced
the host filesystem layout to the user instead of a useful message.
Replace the path-echo with a friendly notice, log the path for operator
diagnostics, and keep the user-supplied caption intact. The Slack adapter
had three identical sites that fell through to the same path-echo on its
own native upload failures; fix those too. send_document still surfaces
the caller-provided file_name (or the basename derived from it) since
that is the user-facing filename, not a host path.
Add regression tests asserting the *_path argument never appears in the
fallback content while caption text and explicit file_name still do.
Follow-up to the salvaged #8008 fix:
- Sibling-site fix: _evaluate_slash_authorization gated DISCORD_ALLOWED_CHANNELS /
DISCORD_IGNORED_CHANNELS on numeric IDs only, so name/#name config that now works
for on_message still silently failed for slash-command interactions. Refactor the
channel-key helper to _discord_channel_keys_from_channel(channel, parent) and reuse
it at the interaction gate. Fail-closed on missing channel id is preserved.
- The contributor's hardcoded 8s flush deadline could be hard-cancelled mid-flush:
_teardown_adapter already wraps cancel_background_tasks() in the per-adapter
disconnect budget (HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT, default 5s). The flush
deadline now derives from that budget with headroom so it always completes inside it.
- AUTHOR_MAP: map cypher@augmentl.com -> Nickperillo for CI.
- Tests: slash-auth name/#name allow + name ignore matching.
Two related fixes to the Discord gateway adapter:
1. Channel name matching (free-response, allowed, ignored, no-thread channels)
Previously these config values only matched against numeric channel IDs.
If a user configured free_response_channels: cypher (by name), the adapter
would silently ignore it because it only intersected against channel_ids.
Now the adapter builds a channel_keys set that includes the channel ID,
channel name, and #channel-name form, and checks all three for each gate.
2. Flush pending text-batch tasks before shutdown
The Discord adapter uses _pending_text_batch_tasks (its own dict) for
merging rapid successive message chunks. These tasks were NOT added to
self._background_tasks (the base class list), so the base
cancel_background_tasks() never awaited them on restart/shutdown.
This caused a race: in-flight response deliveries were cancelled before
Discord had a chance to send them, resulting in silent dropped messages
visible to users as tool-log-only replies with no text body.
Fix: override cancel_background_tasks() in DiscordAdapter to await all
pending text-batch tasks (8s deadline) before delegating to the base class.
A Slack user/legacy token (xoxp-...) makes auth.test resolve to the
installing human's member ID with no bot_id, so the adapter binds its
identity (_bot_user_id / _team_bot_user_ids) to that human. Every
"is this the bot?" check then misfires: that person's <@...> mentions
wake the bot and are stripped as the bot's own mention, so the agent is
genuinely told it was @mentioned and replies to messages merely
addressed to that human (symptom: bot responds to "@trevor ..." and
insists it was explicitly mentioned).
There is no runtime API error to catch — a user token still
sends/receives — so the only detectable moment is connect time. Add a
warning-only nudge (_warn_if_not_bot_token) alongside the existing
group-DM scope nudge: when auth.test resolves a user_id but no bot_id,
log that the token is a user token and to use the xoxb-... Bot User
OAuth Token. Warning-only: does not block a working-but-misconfigured
install. Fires once per workspace per process.
Salvages the two still-valid hardenings from #5381 onto the relocated
plugin adapters (the discord/feishu/whatsapp adapters moved to
plugins/platforms/ since the PR was opened, and 4 of its 6 hunks are
already on main or superseded).
- feishu: rate limiter now denies untracked keys when the tracking table
is at capacity after pruning stale entries (was: allow through without
tracking). At-capacity-with-all-fresh-entries only happens under abuse,
so allowing untracked requests let an attacker who flooded the table
bypass the limiter entirely. Already-tracked keys and post-prune room
are unaffected.
- whatsapp: absolute file paths handed back by the Baileys bridge are now
validated to resolve inside a known media cache dir before being
attached. A compromised/buggy bridge could otherwise return an
arbitrary path (e.g. /etc/passwd) that would be sent verbatim to the
model. Guard resolves symlinks and accepts both the canonical
cache/<kind> and legacy <kind>_cache layouts.
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
The WeCom callback endpoint (internet-facing, 0.0.0.0) parsed untrusted
request bodies before signature verification. defusedxml already guards
the entity-expansion class on main, but there was no cap on raw body
size, so an unauthenticated POST could still force unbounded read work
pre-auth.
Set client_max_size=64KB on the aiohttp app (413 at the framework layer)
plus an explicit length guard in _handle_callback as defense in depth.
WeCom callbacks are small encrypted XML envelopes — media is delivered
out-of-band via MediaId, never inline — so 64KB is ample for legitimate
traffic. Adds tests for oversized (413) and normal-sized (not 413) bodies.
Salvaged from #10192 by @memosr (body-size limit half; defusedxml half
already superseded on main).
Two platform-security hardenings:
- Matrix: _on_invite now checks the inviter against the existing
allow-list (_allowed_user_ids / GATEWAY_ALLOW_ALL_USERS) before
auto-joining. Without this any federated Matrix user could invite
the bot into arbitrary rooms, exposing its presence and metadata.
The message and reaction paths already enforce this allow-list; the
invite path bypassed it.
- Mattermost: _api_get / _api_post / _api_put reject any path
containing '..'. WebSocket-event values (channel_id, post_id,
file_id) are interpolated directly into API paths, so a malicious or
compromised server could craft traversal payloads to make the bot
issue authenticated requests to arbitrary endpoints with its bearer
token.
The configurable-E2EE-passphrase change from the original PR is dropped:
the matrix adapter was rewritten onto mautrix and the passphrase-protected
key-export file no longer exists.
Removed/unauthorized Telegram users could inject prompt content before the
per-user auth gate fired. The adapter ran `_should_process_message`,
`_build_message_event`, and text/photo batching — and dispatched to the
runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected
the sender. Unmentioned group chatter from a removed user was also
persisted into the session transcript via `_observe_unmentioned_group_message`,
leaking into the agent's observed context independent of dispatch.
Add `_is_user_authorized_from_message()` as an intake prefilter that runs
in `_handle_text_message`, `_handle_command`, `_handle_location_message`,
and `_handle_media_message` BEFORE batching, event construction, and the
unmentioned-group observe branch. It reuses the runner's
`_is_user_authorized()` with a correctly-shaped SessionSource (group vs
forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists),
falls back to env allowlists, and only rejects when an allowlist actually
exists — unknown DMs with no allowlist still reach the pairing flow.
Channel posts authorize via `sender_chat` identity when `from_user` is
absent.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>
main (cb982ad99) wired windows_hide_flags() into the auxiliary git/gh/wmic/
bash/powershell/taskkill legs but left two it didn't reach, plus the Electron
backend-launch leg it explicitly deferred. Cover them the same way:
- apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE
pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a
console python.exe and flashes a conhost the desktop backend can't suppress.
Both backend creators put the venv site-packages on PYTHONPATH so imports
still resolve under the base interpreter. (main's commit said this Electron
leg "needs a Windows-tested change of its own".)
- tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord:
ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags().
- plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via
windows_hide_flags().
All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg
legs pass CREATE_NO_WINDOW.
The merged CLOSE-WAIT heartbeat (#52744) only probes get_me(), which uses the
general request path and stays healthy while PTB's getUpdates consumer is
silently wedged (updater.running=True but the long-poll task is stuck, observed
on WSL2). DMs then queue in the Bot API and never reach handlers (#42909).
Augment the existing _polling_heartbeat_loop to also probe
get_webhook_info().pending_update_count. After two consecutive probes that see a
non-draining queue while the updater claims to be running, escalate into the
existing _handle_polling_network_error recovery ladder — no new restart
machinery. No-ops in webhook mode, when the updater is not running, or when a
reconnect is already in flight.
Credit to @gazzumatteo, whose PR #42959 identified the pending_update_count
signal as the missing liveness probe. This reuses the existing heartbeat +
recovery path rather than adding a parallel watchdog.
Fixes#42909.
Rebase onto plugins/platforms/matrix/adapter.py (code moved from
gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct
on invite events and calls _record_dm_room to persist in m.direct
account data.
Fixes#44679