The cross-process coherence guard (#45966) compares the session's
on-disk message_count against the snapshot stored next to the cached
agent, and rebuilds the agent on a mismatch. The guard is correct
when the cache snapshot and the live count both refer to the same
DB row. But the agent cache is keyed by session_key, which can
group multiple conversation threads (different session_ids) under
the same key — and the message_count values belong to DIFFERENT
DB rows.
When the user switches from session A to session B under the same
session_key, the cache hit returns A's cached agent. The guard then
compares A's snapshot count (A.message_count) against B's live count
(B.message_count) — they are NEVER equal because they track
different conversations — and invalidates the cache. Every session
switch busts the prompt cache and forces a fresh agent build. The
post-turn re-baseline (#46237) made it worse: it reads the live
count from the CURRENT session_entry.session_id, so each switch
overwrites the original snapshot with the new session's count,
causing the very next switch BACK to the original session to fire
the guard again.
This is the bug from #54947 (P0, sweeper:risk-session-state,
sweeper:risk-caching).
Fix:
* Record the snapshot's session_id alongside the message_count in
the cache tuple: (agent, sig, mc, session_id) — a 4-tuple. The
cache build at the AIAgent construction site stores the active
session_id.
* The cache-hit guard skips the cross-process count comparison
when the active session_id differs from the snapshot's
session_id — the comparison is meaningless across different DB
rows, so the agent is REUSED without invalidation. The cross-
process guard still fires when the session_id matches and the
live count differs (genuine cross-process write on the SAME
session).
* _refresh_agent_cache_message_count checks the snapshot's
session_id: when it differs from the current session_id, the
snapshot is intentionally left untouched (overwriting it would
corrupt the original conversation's baseline and cause the
switch-back to fire the guard). The legacy 3-tuple shape (no
session_id) is still re-baselined as before.
* Backward-compat:
- 2-tuple (agent, sig) — unchanged, opts out of the guard.
- 3-tuple (agent, sig, mc) — unchanged behavior, standard
cross-process check.
- pending sentinel — unchanged, untouched by re-baseline.
- new 4-tuple (agent, sig, mc, session_id) — full session_id-
aware guard with skip on mismatch.
Tests:
* tests/gateway/test_session_id_cache_coherence.py — 7 tests
covering L1-L5 from LAYERS.md:
- L1 session_id switch must REUSE
- L2 cache tuple records snapshot's session_id
- L3 re-baseline skips when session_id differs
- L4 same-session_id turns still re-baseline (#46237 holds)
- L5 legacy 2-tuples and pending sentinels untouched
- legacy 3-tuple (no session_id) still guarded (#45966 holds)
- 3-tuple transitions to 3-tuple (not 4-tuple) on re-baseline
No regressions in 70 existing tests in test_agent_cache.py or 137
related session tests. Co-authored with #52197 (deferred cleanup
of evicted agents); both fixes compose cleanly.