hermes-agent

Author	SHA1	Message	Date
teknium1	43edbae638	fix(telegram): widen NoneType reconnect guard to the conflict-retry path The network-error reconnect ladder (#55992) captured a stable self._app local across its awaits and failed fast when the adapter was torn down mid-sleep. The 409-conflict retry path had the identical unguarded self._app.updater.start_polling() deref — a concurrent disconnect() during its RETRY_DELAY sleep would raise the same 'NoneType' object has no attribute 'updater' and, on a non-final retry, land in limbo. Apply the same stable-local + fail-fast pattern so the existing except block reschedules or escalates to fatal.	2026-07-01 02:03:58 -07:00
joaomarcos	a682091044	fix(telegram): close reconnect races that leave adapter half-destroyed _handle_polling_network_error's chained retry never updated self._polling_error_task, so the reentrancy guard shared with the heartbeat loop and the pending-updates probe went stale mid-recovery, letting more than one recovery attempt run concurrently against the same adapter. Combined with a TOCTOU window in _handle_adapter_fatal_error (the adapter was only removed from self.adapters in a finally block after awaiting disconnect()), two concurrent fatal notifications for the same adapter could both pass the "still installed" check and call disconnect() twice, which is where the reported "'NoneType' object has no attribute 'updater'" originates once self._app is cleared by the first call. - Reassign the chained retry task to self._polling_error_task so the guard reflects an in-flight recovery. - Capture self._app in a local variable across the stop/start_polling sequence instead of re-reading self._app between awaits. - Claim (pop) the adapter from self.adapters before awaiting disconnect() in _handle_adapter_fatal_error, not after, closing the TOCTOU window for a concurrent notification on the same adapter.	2026-07-01 02:03:58 -07:00
Glen Workman	5505dbbf43	fix(telegram): accept both list and mapping shapes for group_topics config The forum-topic skill-binding lookup assumed config.extra['group_topics'] was always a list of {chat_id, topics} entries. When an operator writes the natural mapping shape ({"-100...": [...]}), iterating yields string keys and chat_entry.get(...) raises AttributeError, breaking dispatch for that group. Normalize both shapes to a common iterator and guard non-dict/non-list entries so malformed config falls through cleanly instead of crashing.	2026-07-01 01:20:14 -07:00
zapabob	500c2b1e46	fix(security): close SSRF redirect-guard bypass across all httpx download hooks Inside httpx AsyncClient response event hooks, response.next_request is often None even for a genuine redirect, so guards keyed on `if response.is_redirect and response.next_request` silently never fire. A public URL that 302s to http://169.254.169.254/ was followed anyway, defeating the pre-flight is_safe_url() check. Resolve the redirect target from the Location header (via urljoin, so relative Locations work too), falling back to next_request only when no Location is present. Extracted as tools.url_safety.redirect_target_from_response and wired into every SSRF redirect guard: - gateway/platforms/base.py (shared image + audio download for all platforms) - tools/vision_tools.py (two download hooks) - plugins/platforms/slack/adapter.py Original fix by @zapabob (PR #35940), which targeted the since-refactored gateway/platforms/slack.py; reconstructed onto the current shared sites and widened to the whole bug class.	2026-07-01 01:18:53 -07:00
0xsir0000	50a7dce6bd	fix(discord): auto-thread failure must not silently fall back to inline reply When discord.auto_thread is enabled and a top-level server-channel message should be routed to a new thread, a transient thread-create failure (e.g. Cannot connect to host discord.com:443) returned None and _handle_message fell through to an inline parent-channel reply — dumping a new task into a shared channel and breaking thread-first workflows. - _auto_create_thread retries the primary + seed-message paths once after a 750ms backoff for transient connect errors. - _handle_message treats None as a hard failure: posts a short visible notice in the parent channel and returns without invoking the agent. The notify send is wrapped so a secondary connect error can't raise. Fixes #20243	2026-07-01 00:12:17 -07:00
nocturnum91	cc1e4c32c0	fix(telegram): normalize thread id in group gating via shared helper Group gating (_should_process_message) read the raw message_thread_id, while event routing (_build_message_event) normalized it. A plain non-forum group reply's message_thread_id is a reply-UI anchor, not a topic, so an anchor id matching an ignored_threads entry wrongly dropped the message, and the anchor was treated as a routable topic under allowed_topics. Extract _effective_message_thread_id and route both gating and event-building through it, so gating and session routing agree on one normalized value: real topic/forum messages keep their thread id, reply anchors are dropped, and forum General-topic messages normalize to the General-topic id.	2026-07-01 00:11:46 -07:00
teknium1	88c9dfecb2	docs(slack): correct block_kit docstrings to reflect native table blocks The renderer now emits native Block Kit table blocks; the module and _rich_blocks_enabled docstrings still described the earlier monospace-only approach.	2026-07-01 00:10:12 -07:00
Ben	7c7b489813	feat(slack): render markdown tables as native Block Kit table blocks Replace the interim monospace table fallback with Slack's native `table` block (rows of rich_text cells). Addresses the core ask in #18918. - _table_block(): builds type:"table" with rich_text cells, so inline formatting (bold, links, code) renders inside cells. - Column alignment parsed from the markdown separator row (:---, :-:, --:) into column_settings (left = default/null-skip, center/right emitted). - Escaped pipes (\\\|) are not treated as column separators. - Respects Slack's table limits (100 rows / 20 cols / 10k aggregate chars); oversized or unparseable tables gracefully fall back to aligned monospace (rich_text_preformatted), so a big table never breaks the message. Docs (EN + zh-Hans) updated to describe native tables + the fallback. Tests: native table shape, alignment->column_settings, inline-formatted cells, oversized/too-wide monospace fallback, escaped-pipe cell. Prove- failed against a stubbed _table_block (native-table tests fail, fallback tests stay green). All existing Slack tests still pass.	2026-07-01 00:10:12 -07:00
Ben	b080b93ad8	feat(slack): opt-in Block Kit rendering for agent messages Add platforms.slack.extra.rich_blocks (default off). When enabled, the final agent message is sent as Slack Block Kit blocks — section headers, dividers, and true nested lists via rich_text — instead of flat mrkdwn. - New plugins/platforms/slack/block_kit.py: pure markdown->blocks renderer (headers, dividers, nested ordered/bullet lists, blockquotes, fenced code; pipe-tables as aligned monospace since Block Kit has no robust table block). Enforces Slack's 50-block / 3000-char section limits and returns None to fall back to plain text on empty/oversized/unexpected input. Never raises. - adapter.send(): render blocks on the single-chunk primary message; a text= fallback is ALWAYS sent alongside (notifications/accessibility). - adapter.edit_message(): blocks only on finalize=True, so intermediate streaming edits stay plain mrkdwn (no per-flush block re-derivation). - Docs (EN + zh-Hans) + config example. Send-side only: no app reinstall. Tests: pure-renderer unit suite + adapter integration suite (blocks present when on, plain text when off, text fallback always set, finalize gating, multi-chunk fallback). Prove-failed against a stubbed renderer.	2026-07-01 00:10:12 -07:00
syahidfrd	0198713c33	fix(security): reuse auth chain when tagging unverified senders in Slack threads Mitigates indirect prompt injection (CWE-863) in Slack thread context. When the bot is mentioned mid-thread for the first time, _fetch_thread_context pulls the full thread via conversations.replies and prepends every reply to the LLM prompt. Replies from senders not on the allowlist were rendered identically to authorised senders, letting a third party in a shared channel inject instructions the model might act on when answering the next authorised message. - BasePlatformAdapter.set_authorization_check / _is_sender_authorized, registered by GatewayRunner._make_adapter_auth_check() with a closure over the existing _is_user_authorized chain (platform/global/group allowlists, allow-all flags, pairing store all stay the single source of truth — no env-var re-parsing). - Tags non-bot thread messages whose sender fails the auth check with an [unverified] prefix; strengthens the header with soft guidance only when at least one unverified message is present, so setups without an allowlist see no behaviour change. - Wired into all three adapter-init sites in run.py (start, reconnect watcher, restart) so the reconnect path is covered too. Softened wording: adapted from the original [untrusted] tag to [unverified] and non-accusatory header framing — the label reflects allowlist status, not a judgment about the person. Adapter relocated to plugins/platforms/slack/ since the PR was authored. Salvaged from #17059.	2026-06-30 18:05:43 -07:00
CRWuTJ	8ad15ff7dd	fix(telegram): cancel delayed deliveries on disconnect Buffered text/photo/media-group flushes and the polling-error recovery task sit behind an asyncio.sleep(). On disconnect they kept running and dispatched handle_message() into a torn-down session, producing stale or duplicate deliveries. disconnect() only cancelled media-group and photo batch tasks — text batches and the polling-error task leaked. Set a _drop_delayed_deliveries flag from _mark_disconnected/_set_fatal_error (cleared by _mark_connected) and check it in all enqueue+flush paths so a flush that wins the race against teardown drops instead of dispatching. _cancel_pending_delivery_tasks() now cancels+clears all four task maps, skipping the current task. Media-group flush finally-block guarded so a cancelled stale flush cannot erase a replacement task handle.	2026-06-30 17:39:30 -07:00
teknium1	36bfe3a449	fix(anthropic+feishu): model-gate max_tokens fallback; wire Feishu channel_prompt Two independent fixes salvaged from #12811 (closing it; one of its three bundled fixes — Discord free_response — is already on main). Anthropic max_tokens (#12790): the chat-completions max_tokens fallback only fired for OpenRouter/Nous URLs, so any other proxy serving a Claude model (AWS Bedrock, NVIDIA, LiteLLM, vLLM, corporate gateways) shipped requests with no max_tokens and inherited the proxy's low default (Bedrock: 4096), exhausting on thinking + large tool calls. Changed the gate in chat_completion_helpers.build_api_kwargs from URL-gated to model-gated: fires whenever the model matches an _ANTHROPIC_OUTPUT_LIMITS key. This also fixes a latent miss — the old 'claude' substring gate skipped MiniMax and Qwen3 even on OpenRouter. Remains a last-resort fallback (build_kwargs only applies it after ephemeral/user/profile max_tokens), so it never overrides an explicit value, and only touches the chat-completions transport (native Anthropic Messages API is a separate path). Feishu channel_prompt (#12805): the Feishu adapter never resolved channel_prompts config, unlike Discord/Slack, so per-channel role prompts were silently ignored. Added _resolve_channel_prompt() (delegating to the shared gateway.platforms.base.resolve_channel_prompt) and wired it into all three MessageEvent construction sites — inbound message, reaction routing, and card-action routing. Tests: tests/gateway/test_feishu_channel_prompts.py (6 cases) covering exact match, parent-thread fallback, no-match, missing-config safety, and event propagation.	2026-06-30 17:20:41 -07:00
codexGW	608e8a6062	fix(discord): accept raw direct bot mentions and ignore bare mention-only pings Some legitimate @bot pings were dropped because the mention gates relied on message.mentions alone, which does not always populate raw <@ID> / <@!ID> forms (mobile, edited, relayed messages). A bare @bot with no other text could also spawn a fake empty-text turn. - add _self_is_explicitly_mentioned() / _raw_mentioned_user_ids() helpers that treat the bot as mentioned via resolved mentions OR raw content forms - use them at the allow_bots=mentions gate, multi-agent bot filtering, the mention-strip/mention_prefix step, and the require_mention gate - drop bare mention-only pings (no text, no media, no injection, no backfill context) instead of injecting a placeholder empty turn Co-authored-by: Teknium <teknium1@gmail.com>	2026-06-30 16:38:31 -07:00
PRATHAMESH75	e55e9fad2c	fix(telegram): recover when polling updater stops while process stays alive The polling heartbeat's pending-update probe treated a stopped updater (running=False) as "someone else's job" and silently reset its counter, so a long-poll task that disappears with no reconnect in flight was never recovered. get_me() on the general request path stays healthy, so neither PTB's error_callback nor the connectivity probe ever fires — the gateway keeps running but stops receiving messages indefinitely (#55769). Detect the stopped-updater case directly in _probe_pending_updates and feed it into the existing _handle_polling_network_error ladder, debounced over two consecutive probes so a just-starting updater or the brief stop()->start_polling() window of an in-flight reconnect never trips it.	2026-06-30 15:36:58 -07:00
Erosika	1f1d346ced	fix(profile): resolve WhatsApp media-path cache roots per-call The inbound-media validator _is_allowed_bridge_path() checked against IMAGE_CACHE_DIR / AUDIO_CACHE_DIR / VIDEO_CACHE_DIR / DOCUMENT_CACHE_DIR value-imported at module load. After the base.py cache-dir getters became per-call resolvers, the bridge writes media into the active profile's cache while the validator still matched the frozen launch-profile constants — so media was rejected under a profile override (multi-profile gateway). Resolve the cache roots per-call via the get_*_cache_dir() getters and drop the now-unused frozen value-imports. Caught by automated review on #55867.	2026-06-30 15:30:06 -07:00
konsisumer	46ab06c238	fix(gateway): honor Discord connect timeout for ready wait	2026-06-30 15:03:25 -07:00
teknium1	af5cea04ab	fix(discord): split oversized final edits, truncate mid-stream previews (#27881 ) DiscordAdapter.edit_message clipped any formatted payload over the 2,000-char cap to [:1997]+"..." and returned success=True, so the stream consumer believed the full reply landed and stopped — the user lost everything past the boundary and perceived the agent as quitting mid-task. edit_message is now overflow-aware, mirroring Telegram's proven contract: - finalize=True: split-and-deliver via _edit_overflow_split — edit chunk 1 in place, send chunks 2..N as reply-threaded continuations, return the last visible id in message_id plus continuation_message_ids so the stream consumer keeps editing the most recent chunk and can clean them all up. - finalize=False (mid-stream): truncate a one-message preview in place, never split. A mid-stream split moves the edit target to a continuation and the next accumulated-token tick re-splits, looping forever (the Telegram #48648 lesson the original port predated). - Reactive 50035 '2000 or fewer in length' on edit runs the same branch logic. - Partial continuation failure still reports success with a partial_overflow raw_response so the consumer retries the tail instead of marking a clipped reply complete. Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: AhmetArif0 <147827411+AhmetArif0@users.noreply.github.com>	2026-06-30 03:49:52 -07:00
Keira Voss	a61cf774ce	feat(whatsapp): tag owner-typed inbound text with [owner reply] prefix When WHATSAPP_FORWARD_OWNER_MESSAGES is enabled and the bridge marks an inbound message with fromOwner=true, also prefix MessageEvent.text with "[owner reply] " at construction time. This makes the disambiguation survive any downstream plugin failure (e.g. handover-rule errors that bypass silent_ingest), so transcripts never misattribute owner-typed text to the customer. Idempotent: re-applies are guarded so a future producer that pre-tags text won't be double-prefixed.	2026-06-30 03:41:43 -07:00
keiravoss94	84f350efe0	feat(whatsapp): opt-in forwarding of owner-typed messages in bot mode In `WHATSAPP_MODE=bot` the bridge currently drops every fromMe inbound message — they are all assumed to be echoes of our own /send calls. That makes it impossible for plugins / agents to detect when a human owner has typed directly into a customer chat from the same WhatsApp Business account (e.g. via a linked phone or WhatsApp Web). This adds an opt-in `WHATSAPP_FORWARD_OWNER_MESSAGES` env var. When true, the bridge classifies fromMe inbound by looking up `key.id` in a bounded LRU of recently-sent message IDs (the existing 50-entry echo suppressor, bumped to 512 and extracted to a testable `outbound_ids.js` helper). Hits in the LRU are still dropped (echoes); misses are forwarded to the Python adapter with `fromOwner: true`. The Python adapter lifts that flag onto `MessageEvent.metadata["whatsapp_from_owner"]`. `metadata` is a new free-form dict on the event so future per-platform signals don't each need their own field. Default behaviour is unchanged: with the env flag unset, bot mode still drops every fromMe message exactly as before. Use cases for downstream consumers: - Implicit handover activation when the owner replies manually - Sliding TTL on owner activity (keep an active session alive while the owner is engaged) - Audit trails of owner interventions - Analytics on human-vs-bot reply ratios Heuristic limitation (documented in code): the LRU is in-memory. After a bridge restart, in-flight delivery receipts of pre-restart sends will briefly look like owner-typed for a few seconds until the set is repopulated. Persisting isn't worth the disk churn — downstream consumers should treat the flag as best-effort. Tests: - tests/gateway/test_whatsapp_from_owner.py (new): adapter sets the metadata flag iff the bridge payload has `fromOwner: true`; absent otherwise. - scripts/whatsapp-bridge/outbound_ids.test.mjs (new): LRU bounds, eviction order, falsy-id handling. Backwards compatibility: with the env flag unset, every code path is identical to before. No existing deployment is affected.	2026-06-30 03:41:43 -07:00
teknium1	f5eb4c307b	fix(gateway): stop Matrix upload fallback from leaking host path The Matrix adapter's _upload_file fell back to sending "(file not found: {file_path})" directly into the room — the same host-path leak class fixed for the base adapter and Slack in the previous commit. Replace it with a friendly notice, log the path at WARN for operators, and preserve any caller-supplied caption.	2026-06-30 03:24:36 -07:00
UgwujaGeorge	cb9d18c759	fix(gateway): stop media-send fallbacks from leaking host paths into chat The base BasePlatformAdapter implementations of send_voice, send_video, send_document, and send_image_file forwarded their _path argument verbatim into the chat text (e.g. "🎬 Video: /home/.../hermes/cache/..."). Telegram, Discord, and Slack adapters all fall back to those base methods when their native send raises — so a rejected video on Telegram surfaced the host filesystem layout to the user instead of a useful message. Replace the path-echo with a friendly notice, log the path for operator diagnostics, and keep the user-supplied caption intact. The Slack adapter had three identical sites that fell through to the same path-echo on its own native upload failures; fix those too. send_document still surfaces the caller-provided file_name (or the basename derived from it) since that is the user-facing filename, not a host path. Add regression tests asserting the _path argument never appears in the fallback content while caption text and explicit file_name still do.	2026-06-30 03:24:36 -07:00
teknium1	b6045170bb	fix(discord): extend channel-name matching to slash-command auth; clamp flush deadline to disconnect budget Follow-up to the salvaged #8008 fix: - Sibling-site fix: _evaluate_slash_authorization gated DISCORD_ALLOWED_CHANNELS / DISCORD_IGNORED_CHANNELS on numeric IDs only, so name/#name config that now works for on_message still silently failed for slash-command interactions. Refactor the channel-key helper to _discord_channel_keys_from_channel(channel, parent) and reuse it at the interaction gate. Fail-closed on missing channel id is preserved. - The contributor's hardcoded 8s flush deadline could be hard-cancelled mid-flush: _teardown_adapter already wraps cancel_background_tasks() in the per-adapter disconnect budget (HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT, default 5s). The flush deadline now derives from that budget with headroom so it always completes inside it. - AUTHOR_MAP: map cypher@augmentl.com -> Nickperillo for CI. - Tests: slash-auth name/#name allow + name ignore matching.	2026-06-30 02:48:42 -07:00
Cypher	cb9308f0a6	fix(discord): channel name matching and flush pending sends on shutdown Two related fixes to the Discord gateway adapter: 1. Channel name matching (free-response, allowed, ignored, no-thread channels) Previously these config values only matched against numeric channel IDs. If a user configured free_response_channels: cypher (by name), the adapter would silently ignore it because it only intersected against channel_ids. Now the adapter builds a channel_keys set that includes the channel ID, channel name, and #channel-name form, and checks all three for each gate. 2. Flush pending text-batch tasks before shutdown The Discord adapter uses _pending_text_batch_tasks (its own dict) for merging rapid successive message chunks. These tasks were NOT added to self._background_tasks (the base class list), so the base cancel_background_tasks() never awaited them on restart/shutdown. This caused a race: in-flight response deliveries were cancelled before Discord had a chance to send them, resulting in silent dropped messages visible to users as tool-log-only replies with no text body. Fix: override cancel_background_tasks() in DiscordAdapter to await all pending text-batch tasks (8s deadline) before delegating to the base class.	2026-06-30 02:48:42 -07:00
Ben	184c10cf97	fix(slack): warn when configured token is a user token, not a bot token A Slack user/legacy token (xoxp-...) makes auth.test resolve to the installing human's member ID with no bot_id, so the adapter binds its identity (_bot_user_id / _team_bot_user_ids) to that human. Every "is this the bot?" check then misfires: that person's <@...> mentions wake the bot and are stripped as the bot's own mention, so the agent is genuinely told it was @mentioned and replies to messages merely addressed to that human (symptom: bot responds to "@trevor ..." and insists it was explicitly mentioned). There is no runtime API error to catch — a user token still sends/receives — so the only detectable moment is connect time. Add a warning-only nudge (_warn_if_not_bot_token) alongside the existing group-DM scope nudge: when auth.test resolves a user_id but no bot_id, log that the token is a user token and to use the xoxb-... Bot User OAuth Token. Warning-only: does not block a working-but-misconfigured install. Fires once per workspace per process.	2026-06-29 20:57:43 -07:00
Mibayy	9e490138a0	fix(security): fail-closed feishu webhook rate limiter + whatsapp bridge path guard Salvages the two still-valid hardenings from #5381 onto the relocated plugin adapters (the discord/feishu/whatsapp adapters moved to plugins/platforms/ since the PR was opened, and 4 of its 6 hunks are already on main or superseded). - feishu: rate limiter now denies untracked keys when the tracking table is at capacity after pruning stale entries (was: allow through without tracking). At-capacity-with-all-fresh-entries only happens under abuse, so allowing untracked requests let an attacker who flooded the table bypass the limiter entirely. Already-tracked keys and post-prune room are unaffected. - whatsapp: absolute file paths handed back by the Baileys bridge are now validated to resolve inside a known media cache dir before being attached. A compromised/buggy bridge could otherwise return an arbitrary path (e.g. /etc/passwd) that would be sent verbatim to the model. Guard resolves symlinks and accepts both the canonical cache/<kind> and legacy <kind>_cache layouts.	2026-06-29 04:25:31 -07:00
teknium1	34e616e778	feat(slack): nudge stale installs to add mpim scopes; mark message.mpim required Follow-up to the group-DM manifest fix. The manifest change only helps NEW installs; existing apps keep their old (mpim-less) scopes until the admin reinstalls. Since a missing message.mpim event delivers nothing (no runtime API error to catch), detect stale installs at connect time from the auth.test x-oauth-scopes header and log an actionable reinstall nudge when im:history is granted but mpim:history is not. Also promote message.mpim from Recommended to Required in the docs event tables so the default setup path can't drop it.	2026-06-29 01:02:53 -07:00
Teknium	74541beb9c	fix(security): cap WeCom callback body size before pre-auth XML parse (#54615 ) The WeCom callback endpoint (internet-facing, 0.0.0.0) parsed untrusted request bodies before signature verification. defusedxml already guards the entity-expansion class on main, but there was no cap on raw body size, so an unauthenticated POST could still force unbounded read work pre-auth. Set client_max_size=64KB on the aiohttp app (413 at the framework layer) plus an explicit length guard in _handle_callback as defense in depth. WeCom callbacks are small encrypted XML envelopes — media is delivered out-of-band via MediaId, never inline — so 64KB is ample for legitimate traffic. Adds tests for oversized (413) and normal-sized (not 413) bodies. Salvaged from #10192 by @memosr (body-size limit half; defusedxml half already superseded on main).	2026-06-28 22:35:43 -07:00
aaronagent	d836b2bac4	fix(matrix,mattermost): invite auth check + API path traversal guard Two platform-security hardenings: - Matrix: _on_invite now checks the inviter against the existing allow-list (_allowed_user_ids / GATEWAY_ALLOW_ALL_USERS) before auto-joining. Without this any federated Matrix user could invite the bot into arbitrary rooms, exposing its presence and metadata. The message and reaction paths already enforce this allow-list; the invite path bypassed it. - Mattermost: _api_get / _api_post / _api_put reject any path containing '..'. WebSocket-event values (channel_id, post_id, file_id) are interpolated directly into API paths, so a malicious or compromised server could craft traversal payloads to make the bot issue authenticated requests to arbitrary endpoints with its bearer token. The configurable-E2EE-passphrase change from the original PR is dropped: the matrix adapter was rewritten onto mautrix and the passphrase-protected key-export file no longer exists.	2026-06-28 20:47:33 -07:00
teknium1	c648ecdca5	fix(telegram): reject unauthorized users before event construction (#40863 ) Removed/unauthorized Telegram users could inject prompt content before the per-user auth gate fired. The adapter ran `_should_process_message`, `_build_message_event`, and text/photo batching — and dispatched to the runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected the sender. Unmentioned group chatter from a removed user was also persisted into the session transcript via `_observe_unmentioned_group_message`, leaking into the agent's observed context independent of dispatch. Add `_is_user_authorized_from_message()` as an intake prefilter that runs in `_handle_text_message`, `_handle_command`, `_handle_location_message`, and `_handle_media_message` BEFORE batching, event construction, and the unmentioned-group observe branch. It reuses the runner's `_is_user_authorized()` with a correctly-shaped SessionSource (group vs forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists), falls back to env allowlists, and only rejects when an allowlist actually exists — unknown DMs with no allowlist still reach the pairing flow. Channel posts authorize via `sender_chat` identity when `from_user` is absent. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com> Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>	2026-06-28 14:25:15 -07:00
Brooklyn Nicholson	eeca59f489	fix(windows): hide remaining backend console-flash legs missed on main main (`cb982ad99`) wired windows_hide_flags() into the auxiliary git/gh/wmic/ bash/powershell/taskkill legs but left two it didn't reach, plus the Electron backend-launch leg it explicitly deferred. Cover them the same way: - apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a console python.exe and flashes a conhost the desktop backend can't suppress. Both backend creators put the venv site-packages on PYTHONPATH so imports still resolve under the base interpreter. (main's commit said this Electron leg "needs a Windows-tested change of its own".) - tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord: ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags(). - plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via windows_hide_flags(). All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg legs pass CREATE_NO_WINDOW.	2026-06-28 10:19:21 -05:00
teknium1	d5ba374c03	fix(telegram): detect wedged getUpdates consumer via pending_update_count The merged CLOSE-WAIT heartbeat (#52744) only probes get_me(), which uses the general request path and stays healthy while PTB's getUpdates consumer is silently wedged (updater.running=True but the long-poll task is stuck, observed on WSL2). DMs then queue in the Bot API and never reach handlers (#42909). Augment the existing _polling_heartbeat_loop to also probe get_webhook_info().pending_update_count. After two consecutive probes that see a non-draining queue while the updater claims to be running, escalate into the existing _handle_polling_network_error recovery ladder — no new restart machinery. No-ops in webhook mode, when the updater is not running, or when a reconnect is already in flight. Credit to @gazzumatteo, whose PR #42959 identified the pending_update_count signal as the missing liveness probe. This reuses the existing heartbeat + recovery path rather than adding a parallel watchdog. Fixes #42909.	2026-06-28 02:44:17 -07:00
liuhao1024	14baeefe1d	fix(matrix): record DM rooms in m.direct on invite to prevent group misclassification Rebase onto plugins/platforms/matrix/adapter.py (code moved from gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct on invite events and calls _record_dm_room to persist in m.direct account data. Fixes #44679	2026-06-28 02:37:52 -07:00
yungchentang	7e2ca7f68d	fix(telegram): reset send pool after pool timeouts	2026-06-28 02:34:17 -07:00
Teknium	2ecb6f7fe6	fix(telegram): clear send_path_degraded on successful reconnect (#35205 ) (#54076 ) * fix(telegram): clear send_path_degraded on successful reconnect _send_path_degraded was cleared only in _verify_polling_after_reconnect, 60s after reconnect and only if scheduled. A clean start_polling() reconnect left the flag stuck True, short-circuiting send() and blocking all outbound messages until the deferred probe ran (or forever if it never did). Clear the flag the moment start_polling() succeeds — that is the recovery signal. The deferred probe remains a defensive re-check that re-enters the reconnect ladder (re-setting the flag) if it detects a silent wedge. Fixes #35205. * docs: add infographic for #35205 telegram send-path fix	2026-06-28 01:38:17 -07:00
konsisumer	3f543229f2	fix(telegram): notify user when clarify button tap arrives after expiry	2026-06-28 01:07:53 -07:00
sweetcornna	fc70d023d8	fix(telegram): apply bot auth policy to Telegram sources # Conflicts: # gateway/config.py	2026-06-28 00:57:03 -07:00
Teknium	f03823014b	fix(telegram): kill 409 polling conflict loop by disarming PTB retry synchronously (#53941 ) Telegram polling entered a self-inflicted ~31s loop of 409 Conflict -> retry -> resume -> Conflict. The error_callback PTB invokes synchronously inside its internal network_retry_loop only scheduled our async recovery task (loop.create_task) and returned, so PTB kept polling getUpdates on its own while our handler concurrently ran stop -> sleep -> start_polling. The two polling sessions overlapped and Telegram returned a fresh 409. Fix: in the conflict branch of the error_callback, synchronously set PTB's private polling stop_event before scheduling recovery. PTB's loop exits on its next tick (it races that event in do_action), so our handler owns polling alone. The handler's await updater.stop() drains the task and PTB clears the event, so the subsequent start_polling() builds a fresh event and is not poisoned. Keeps the existing reconnect ladder intact (option B) — fixes only the race. Defensive: probes mangled + unmangled stop_event spellings and no-ops (prior behaviour) if neither exists; never flips _running, which would make the handler skip stop() and leave the loop wedged.	2026-06-27 20:46:08 -07:00
konsisumer	11b0be8d15	fix(gateway): avoid Matrix pending invite boot loops	2026-06-27 20:45:51 -07:00
xxxigm	6f1a176b33	fix(gateway/discord): REST liveness probe to detect zombie clients (#26656 ) The Discord adapter could enter a silent zombie state after a network outage / proxy stall: the process is alive, _client looks open, but the underlying socket is dead. discord.py's WebSocket reconnect never sees a RST through a wedged proxy/NAT, so client.start() spins forever without exiting — which means the bot-task done callback (which only fires on task completion) never trips either. The bot stays "offline" in Discord until a manual `hermes gateway restart`. Reported offline for 13-17h. Adds an out-of-band REST liveness probe in DiscordAdapter. Every `discord.liveness_interval_seconds` (default 60s) the adapter issues a cheap fetch_user(bot_id) — the same REST path as message delivery, so it fails when the proxy/NAT is wedged. After `discord.liveness_failure_threshold` consecutive failures (default 3) the probe closes the wedged client and surfaces a retryable fatal error, which trips the gateway's existing _platform_reconnect_watcher and rebuilds the adapter. Operators disable it by setting either knob to 0. Config lives in config.yaml (discord.liveness_) per the .env-is-secrets policy; _apply_yaml_config bridges it to internal env vars the adapter reads, matching the existing HERMES_DISCORD_TEXT_BATCH_ pattern. Co-authored-by: Hermes Agent <agent@nousresearch.com>	2026-06-27 19:30:32 -07:00
Teknium	db16854f34	fix(telegram): surface failed media downloads to user and agent, not a silent empty turn (#53912 ) When a Telegram attachment download/cache fails (typically a transient httpx.ConnectError to Telegram's CDN), the except handler logged a warning and fell through to handle_message() with empty media and no text — the user thought the file was delivered, the agent saw a content-less turn with no signal an attachment was attempted, and the only record was a buried log line. Adds _surface_media_cache_failure(): replies to the user in Telegram so they know to retry, and appends an agent-visible notice to event.text via the existing _append_observed_note channel so the agent knows an attachment was attempted and failed. No new event fields (structured-event refactor is out of scope per #23045). Wired into all five cache-failure sites — photo, voice, audio, video, document — since they shared the identical silent fall-through. Bug 1 from #23045 (unsupported types routed as fake user messages) no longer exists on main: the document handler now accepts any file type, so there is no rejection branch to fix. Closes #23045	2026-06-27 19:12:57 -07:00
bykim0119	851f75d4df	fix(discord): honor "" wildcard in DISCORD_ALLOWED_USERS (#22334 ) DISCORD_ALLOWED_USERS="" now means "allow everyone", matching the SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and the value `claw migrate` emits. Previously _is_allowed_user did exact ID matching only, so "" matched no user and blocked every non-self sender — a P1 with no workaround. Three sites, all required for the fix to hold at runtime: - _is_allowed_user: short-circuit when "" is in the allowlist. - connect(): exclude "" from the intents.members trigger so the wildcard does not request the privileged Server Members intent (which can block the bot from coming online). - _resolve_allowed_usernames: preserve "" verbatim; otherwise it lands in the username-resolution bucket, matches no member, and is silently dropped from the set and env var on the first on_ready — quietly undoing the fix. Slash auth delegates to _is_allowed_user (auto-covered); component auth already honors "*" on main.	2026-06-27 19:11:30 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
Teknium	2ecca1e7d3	fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829 ) Follow-up to #53791 addressing review feedback: the footgun checker treated capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a Windows console. That invariant is false — stream redirection controls where a child's output goes, not whether a console is allocated. From a console-less parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem child still flashes a window even when fully captured. - check-windows-footguns.py: capture/redirect/check_output is no longer a blanket safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/ docker/powershell/…); calls to those are flagged even when captured. Non-flashing programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/ popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW). - Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional). - Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.	2026-06-27 14:49:41 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
Teknium	ef17cd204d	fix(windows): stop subprocess console-window popups + add CI guard (#53791 ) * fix(windows): stop subprocess console-window popups + add CI guard The single biggest source of Windows 'terminal popup' bug reports was bare subprocess.run/Popen calls spawning a console window. The compat helpers (windows_hide_flags / windows_detach_popen_kwargs) already existed but the footgun checker had no rule to stop new bare calls from reintroducing the flash. - scripts/check-windows-footguns.py: new AST-based rule flagging subprocess calls that can create a new console — output-redirection-aware (capture/ redirect/check_output exempt) and POSIX-only-program-aware (launchctl/ systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation burden on calls that can't flash. - Swept all genuine window-spawning sites through windows_hide_flags()/ windows_detach_popen_kwargs(); marked intentionally-visible launches (editor/terminal/foreground re-exec) with '# windows-footgun: ok'. - tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract tests + full-repo cleanliness invariant. - CONTRIBUTING.md: documents the rule + the helper pattern. * test: accept creationflags kwarg in psutil_android fake_subprocess_run The Windows no-window sweep added creationflags=windows_hide_flags() to install_psutil_android.py's subprocess.run call; the test's fake stub had a fixed (cmd) signature and raised TypeError on the new kwarg.	2026-06-27 13:03:51 -07:00
Teknium	cd592c105c	feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598 ) send_message with MEDIA:/path to a WhatsApp target previously dropped the attachment: the WhatsApp branch never passed media_files, the plugin's _standalone_send accepted the param but only POSTed text, and WhatsApp was absent from the media-supported platform list. - send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that routes media_files through the whatsapp plugin's standalone_sender_fn, and add whatsapp to the supported-media list strings. - whatsapp adapter: _standalone_send now sends text first (skipped when the chunk is media-only), then uploads each file via the bridge /send-media endpoint with a mediaType derived from extension/is_voice/force_document, so images/videos/voice arrive as native bubbles instead of documents. - _bridge_media_type classifier maps ext -> image\|video\|audio\|document. Closes #19105 (remaining send_message gap). Other items in the report (inbound video paths, image_generate auto-deliver, history dedup, native gateway bubbles) already landed on main.	2026-06-27 04:40:05 -07:00
r266-tech	dbc925b755	Guard oversized Telegram video downloads	2026-06-27 04:39:48 -07:00
teknium1	7ee0b68973	fix(gateway,feishu): refuse executor resurrection during real shutdown Add an explicit _closing guard to both owned executors so the recreate-on-shutdown path only recovers from an external teardown of the loop default — never resurrects a pool the gateway/adapter itself stopped. _shutdown_executor() sets the flag; _get_executor() raises if closing; feishu connect() re-arms on reconnect. Updates the gateway recreate test to assert the refusal contract and adds feishu coverage.	2026-06-27 04:13:09 -07:00
teknium1	b296915c82	fix(feishu): route blocking SDK calls through an adapter-owned executor Feishu SDK calls ran on asyncio's shared default executor, so a torn-down default executor wedged every send with 'Executor shutdown has been called' and left the gateway a zombie (#10849). The adapter now owns a ThreadPoolExecutor recreated on demand if shut down, mirroring the gateway-owned executor change. Routes all 17 self._client SDK calls through _run_blocking; shuts the pool down on disconnect.	2026-06-27 04:13:09 -07:00
teknium1	ab1f9b94c5	fix(telegram): accept @username chat_id in delivery paths (#13206 ) TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid literal for int()'. The Telegram Bot API accepts both a numeric chat_id and an @username string; Hermes was force-coercing every chat_id with int(). Add normalize_telegram_chat_id() (returns int for numeric values, passes @username strings through) and apply it at the Bot API send/edit sites in the Telegram adapter and the send_message tool. Username targets are now recognized as explicit targets in _parse_target_ref. Reapplies the approach from #13274 (season179), whose branch predated the gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah). Co-authored-by: season179 <season.saw@gmail.com>	2026-06-27 04:01:58 -07:00

1 2 3 4 5

238 commits