hermes-agent/gateway
Tranquil-Flow e7562c394f fix(gateway): skip cross-process guard on session_id switch under same session_key (#54947)
The cross-process coherence guard (#45966) compares the session's
on-disk message_count against the snapshot stored next to the cached
agent, and rebuilds the agent on a mismatch.  The guard is correct
when the cache snapshot and the live count both refer to the same
DB row.  But the agent cache is keyed by session_key, which can
group multiple conversation threads (different session_ids) under
the same key — and the message_count values belong to DIFFERENT
DB rows.

When the user switches from session A to session B under the same
session_key, the cache hit returns A's cached agent.  The guard then
compares A's snapshot count (A.message_count) against B's live count
(B.message_count) — they are NEVER equal because they track
different conversations — and invalidates the cache.  Every session
switch busts the prompt cache and forces a fresh agent build.  The
post-turn re-baseline (#46237) made it worse: it reads the live
count from the CURRENT session_entry.session_id, so each switch
overwrites the original snapshot with the new session's count,
causing the very next switch BACK to the original session to fire
the guard again.

This is the bug from #54947 (P0, sweeper:risk-session-state,
sweeper:risk-caching).

Fix:
  * Record the snapshot's session_id alongside the message_count in
    the cache tuple: (agent, sig, mc, session_id) — a 4-tuple.  The
    cache build at the AIAgent construction site stores the active
    session_id.
  * The cache-hit guard skips the cross-process count comparison
    when the active session_id differs from the snapshot's
    session_id — the comparison is meaningless across different DB
    rows, so the agent is REUSED without invalidation.  The cross-
    process guard still fires when the session_id matches and the
    live count differs (genuine cross-process write on the SAME
    session).
  * _refresh_agent_cache_message_count checks the snapshot's
    session_id: when it differs from the current session_id, the
    snapshot is intentionally left untouched (overwriting it would
    corrupt the original conversation's baseline and cause the
    switch-back to fire the guard).  The legacy 3-tuple shape (no
    session_id) is still re-baselined as before.
  * Backward-compat:
      - 2-tuple (agent, sig) — unchanged, opts out of the guard.
      - 3-tuple (agent, sig, mc) — unchanged behavior, standard
        cross-process check.
      - pending sentinel — unchanged, untouched by re-baseline.
      - new 4-tuple (agent, sig, mc, session_id) — full session_id-
        aware guard with skip on mismatch.

Tests:
  * tests/gateway/test_session_id_cache_coherence.py — 7 tests
    covering L1-L5 from LAYERS.md:
      - L1 session_id switch must REUSE
      - L2 cache tuple records snapshot's session_id
      - L3 re-baseline skips when session_id differs
      - L4 same-session_id turns still re-baseline (#46237 holds)
      - L5 legacy 2-tuples and pending sentinels untouched
      - legacy 3-tuple (no session_id) still guarded (#45966 holds)
      - 3-tuple transitions to 3-tuple (not 4-tuple) on re-baseline

No regressions in 70 existing tests in test_agent_cache.py or 137
related session tests.  Co-authored with #52197 (deferred cleanup
of evicted agents); both fixes compose cleanly.
2026-07-01 02:29:24 -07:00
..
assets fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
builtin_hooks remove: BOOT.md built-in hook (#17093) 2026-04-28 09:50:27 -07:00
platforms fix(gateway): await async post-delivery callbacks in chained wrapper 2026-07-01 02:12:25 -07:00
relay refactor(relay): purge platform-specific scope terminology from the relay adapter (D-Q2.5c) (#56016) 2026-07-01 12:30:59 +10:00
__init__.py docs(gateway): mention Weixin in gateway help and docstrings 2026-05-12 17:08:51 -07:00
authz_mixin.py fix(telegram): apply bot auth policy to Telegram sources 2026-06-28 00:57:03 -07:00
cgroup_cleanup.py fix: satisfy ruff encoding + windows-footgun lints for cgroup reaper 2026-06-28 02:05:50 -07:00
channel_directory.py docs(sessions): clarify sessions.json is the gateway routing index, not the session list (#51726) 2026-06-23 23:56:36 -07:00
code_skew.py fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
config.py feat(gateway): per-platform typing_indicator toggle 2026-06-29 21:12:57 -07:00
dead_targets.py fix(gateway): skip confirmed-dead delivery targets (deleted groups, blocked bots) (#55115) 2026-06-29 13:23:29 -07:00
delivery.py refactor(gateway): reuse looks_like_telegram_private_chat_id helper 2026-07-01 01:01:36 -07:00
display_config.py feat(discord): render reasoning as -# subtext via display.reasoning_style (#51168) 2026-06-23 10:44:02 -07:00
drain_control.py feat(gateway): suppress home-channel shutdown broadcast on flagged drains (#54824) 2026-06-29 12:18:11 -07:00
hooks.py feat(hooks): expose thread_id and chat_type in agent:start/end context (#41672) 2026-06-07 19:16:36 -07:00
kanban_watchers.py fix(kanban): honor kanban.auto_decompose toggle live, without a gateway restart (#50358) 2026-06-21 12:43:44 -07:00
memory_monitor.py Port from cline/cline#10343: periodic gateway memory logging (#27102) 2026-05-16 12:55:23 -07:00
message_timestamps.py feat(gateway): inject stable human-readable message timestamps 2026-06-16 15:49:59 -07:00
mirror.py fix(cron): mirror continuable cron as a labelled user turn (alternation-safe) 2026-06-24 20:27:05 -07:00
pairing.py fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips 2026-05-23 01:46:34 -07:00
platform_registry.py perf(startup): lazy-load gateway platform adapters (#54448) 2026-06-28 15:11:59 -07:00
response_filters.py fix(gateway): suppress NO_REPLY/[SILENT] markers on the streaming path 2026-06-30 23:37:04 -07:00
restart.py fix(gateway): exit 78 (EX_CONFIG) on fatal startup errors, s6 finish script stops restart loop 2026-06-24 16:34:51 +10:00
rich_sent_store.py style(profile): frame comments around what the code does 2026-06-30 15:30:06 -07:00
run.py fix(gateway): skip cross-process guard on session_id switch under same session_key (#54947) 2026-07-01 02:29:24 -07:00
runtime_footer.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
scale_to_zero.py feat(gateway): scale-to-zero idle detection + dormant-quiesce (Phase 0) 2026-06-24 18:47:18 -07:00
session.py fix(gateway): persist compressed transcript before repointing /compress session 2026-07-01 01:39:23 -07:00
session_context.py fix(api-server): stop silently promising async delivery on stateless HTTP path (#50319) 2026-06-21 12:15:14 -07:00
shutdown_forensics.py chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926) 2026-05-11 11:03:29 -07:00
slash_access.py feat(gateway): per-platform admin/user split for slash commands (salvage of #4443) (#23373) 2026-05-10 12:33:54 -07:00
slash_commands.py fix(gateway): persist compressed transcript before repointing /compress session 2026-07-01 01:39:23 -07:00
status.py fix(windows): cover remaining console-flash spawn legs (#54417) 2026-06-28 13:49:08 -07:00
sticker_cache.py fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes 2026-05-19 00:12:41 -07:00
stream_consumer.py fix(gateway): suppress NO_REPLY/[SILENT] markers on the streaming path 2026-06-30 23:37:04 -07:00
stream_dispatch.py feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250) 2026-06-02 00:33:50 -07:00
stream_events.py feat(gateway): structured stream-event protocol + Telegram draft formatting parity (#37250) 2026-06-02 00:33:50 -07:00
whatsapp_identity.py fix(whatsapp): resolve LID aliases on modern platforms/ session layout 2026-06-28 02:05:26 -07:00