fix(mcp): stop EventBridge silently dropping sessions.json-only changes

The MCP serve event bridge polls two files to decide whether there is new
conversation activity to surface to MCP clients: the gateway sessions.json
index and state.db. Its skip-when-unchanged guard was self-defeating — it
refreshed self._sessions_json_mtime with the current value *before*
comparing against it, so the sessions.json term was always true and the
guard collapsed to a state.db-only check.

The impact is silent message loss on the event stream. The gateway commonly
persists a message to state.db on one tick and registers the owning
conversation in sessions.json a moment later. On that later tick only
sessions.json has changed, so the broken guard takes the early return and
never processes the freshly-registered chat. Its messages are withheld from
every connected MCP client (events_poll / events_wait) until state.db
happens to change again — which, for an otherwise-idle conversation, may be
never. A polling bridge that quietly swallows new conversations is exactly
the failure mode this watcher exists to prevent.

The fix is minimal and low-risk: capture the previously-seen sessions.json
mtime before the cache refresh and compare against that, so the guard skips
only when NEITHER file changed since the last poll. The hot-path mtime
optimization is fully preserved (a genuinely idle tick still short-circuits),
and all existing EventBridge polling tests continue to pass unchanged.

## What does this PR do?

Fixes a logic error in `EventBridge._poll_once` (`mcp_serve.py`) where the
"nothing changed, skip this poll" guard compared `sj_mtime` against
`self._sessions_json_mtime` *after* that attribute had already been
overwritten with `sj_mtime`. The comparison was therefore always true,
reducing the intended "skip only if both files are unchanged" check to a
state.db-only check and discarding any tick in which only sessions.json
changed. The guard now compares against the mtime observed on the previous
poll, restoring the intended behavior.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ]  New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ]  Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)

## Changes Made

- `mcp_serve.py`: in `EventBridge._poll_once`, snapshot
  `prev_sessions_json_mtime = self._sessions_json_mtime` before refreshing the
  cached index, and use it in the skip guard
  (`sj_mtime == prev_sessions_json_mtime`) so a sessions.json-only change no
  longer triggers the early return. Added a comment explaining the seam.
- `tests/test_mcp_serve.py`: added
  `TestEventBridgePollE2E::test_poll_picks_up_new_conversation_when_only_sessions_json_changed`,
  a regression test that reproduces the boundary state (state.db unchanged,
  sessions.json newly updated) and asserts the new conversation's message is
  emitted.

## How to Test

1. Reproduce the failure on the old code: with the guard comparing against
   `self._sessions_json_mtime`, the new test fails — the freshly-registered
   conversation yields `0` events instead of `1`.
2. Apply the fix and run `pytest tests/test_mcp_serve.py -q` — all 46 tests
   pass (40 skipped require the optional `mcp` SDK), including the three
   pre-existing `TestEventBridgePollE2E` polling tests and the new regression
   guard.
3. `ruff check mcp_serve.py tests/test_mcp_serve.py` and
   `python scripts/check-windows-footguns.py mcp_serve.py` both report clean.

## Checklist

### Code

- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/test_mcp_serve.py -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
This commit is contained in:
rrevenanttt 2026-06-21 23:20:05 +03:00 committed by Teknium
parent 55c8b2c81f
commit 03bbd37dd7
2 changed files with 71 additions and 2 deletions

View file

@ -356,7 +356,10 @@ class EventBridge:
Uses mtime checks on sessions.json and state.db to skip work
when nothing has changed makes 200ms polling essentially free.
"""
# Check if sessions.json has changed (mtime check is ~1μs)
# Check if sessions.json has changed (mtime check is ~1μs).
# Capture the previously-seen mtime *before* refreshing the cache so the
# skip guard below can still tell whether sessions.json changed this tick.
prev_sessions_json_mtime = self._sessions_json_mtime
sessions_file = _get_sessions_dir() / "sessions.json"
try:
sj_mtime = sessions_file.stat().st_mtime if sessions_file.exists() else 0.0
@ -379,7 +382,16 @@ class EventBridge:
except OSError:
db_mtime = 0.0
if db_mtime == self._state_db_mtime and sj_mtime == self._sessions_json_mtime:
# Skip only when NEITHER file changed since the last poll. Comparing
# against ``prev_sessions_json_mtime`` (not the freshly-stored
# ``self._sessions_json_mtime``) is essential: the latter was just set to
# ``sj_mtime`` above, so using it would make the sessions.json term
# always true and collapse the guard to a db-only check. That would
# discard a tick where only sessions.json changed — e.g. a brand-new
# conversation registered after its first message already landed in
# state.db on an earlier tick — and the new chat's messages would never
# be emitted until state.db happened to change again.
if db_mtime == self._state_db_mtime and sj_mtime == prev_sessions_json_mtime:
return # Nothing changed since last poll — skip entirely
self._state_db_mtime = db_mtime

View file

@ -1232,6 +1232,63 @@ class TestEventBridgePollE2E:
assert len(r2["events"]) == 1
assert r2["events"][0]["content"] == "New reply!"
def test_poll_picks_up_new_conversation_when_only_sessions_json_changed(
self, tmp_path, monkeypatch
):
"""A conversation registered in sessions.json *after* its first message
already landed in state.db must still be picked up, even when state.db's
mtime did not move on this tick.
Regression guard: the skip-when-unchanged check must compare against the
sessions.json mtime seen on the *previous* poll. If it instead compares
against the just-refreshed value, the sessions.json term is always true,
the guard collapses to a db-only check, and a sessions.json-only change
is silently dropped the new chat's messages never reach MCP clients.
"""
import mcp_serve
sessions_dir = tmp_path / "sessions"
sessions_dir.mkdir()
monkeypatch.setattr(mcp_serve, "_get_sessions_dir", lambda: sessions_dir)
# _poll_once reads <HERMES_HOME>/state.db for its mtime gate; the autouse
# fixture points HERMES_HOME at tmp_path.
db_path = tmp_path / "state.db"
db_path.write_text("placeholder")
session_id = "20260329_150000_late_register"
sessions_json = sessions_dir / "sessions.json"
sessions_json.write_text(json.dumps({
"agent:main:telegram:dm:late": {
"session_id": session_id,
"platform": "telegram",
"origin": {"platform": "telegram", "chat_id": "late"},
}
}))
class DB:
def get_messages(self, sid):
return [{
"id": 1, "role": "user",
"content": "Hello from a freshly-registered conversation",
"timestamp": "2026-03-29T15:00:00",
}]
bridge = mcp_serve.EventBridge()
# Simulate the boundary state: state.db has NOT changed since the last
# poll (its message landed on an earlier tick), but sessions.json was
# only updated with this conversation now — the bridge has not yet seen
# the current sessions.json content.
bridge._state_db_mtime = db_path.stat().st_mtime
bridge._sessions_json_mtime = 0.0
bridge._poll_once(DB())
result = bridge.poll_events(after_cursor=0)
assert len(result["events"]) == 1
assert result["events"][0]["session_key"] == "agent:main:telegram:dm:late"
assert result["events"][0]["content"].startswith("Hello from a freshly")
def test_poll_interval_is_200ms(self):
"""Verify the poll interval constant."""
from mcp_serve import POLL_INTERVAL