From 03bbd37dd7fbef796d2f9d9e8c54ee190d2ccc57 Mon Sep 17 00:00:00 2001 From: rrevenanttt <290873280+rrevenanttt@users.noreply.github.com> Date: Sun, 21 Jun 2026 23:20:05 +0300 Subject: [PATCH] fix(mcp): stop EventBridge silently dropping sessions.json-only changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The MCP serve event bridge polls two files to decide whether there is new conversation activity to surface to MCP clients: the gateway sessions.json index and state.db. Its skip-when-unchanged guard was self-defeating — it refreshed self._sessions_json_mtime with the current value *before* comparing against it, so the sessions.json term was always true and the guard collapsed to a state.db-only check. The impact is silent message loss on the event stream. The gateway commonly persists a message to state.db on one tick and registers the owning conversation in sessions.json a moment later. On that later tick only sessions.json has changed, so the broken guard takes the early return and never processes the freshly-registered chat. Its messages are withheld from every connected MCP client (events_poll / events_wait) until state.db happens to change again — which, for an otherwise-idle conversation, may be never. A polling bridge that quietly swallows new conversations is exactly the failure mode this watcher exists to prevent. The fix is minimal and low-risk: capture the previously-seen sessions.json mtime before the cache refresh and compare against that, so the guard skips only when NEITHER file changed since the last poll. The hot-path mtime optimization is fully preserved (a genuinely idle tick still short-circuits), and all existing EventBridge polling tests continue to pass unchanged. ## What does this PR do? Fixes a logic error in `EventBridge._poll_once` (`mcp_serve.py`) where the "nothing changed, skip this poll" guard compared `sj_mtime` against `self._sessions_json_mtime` *after* that attribute had already been overwritten with `sj_mtime`. The comparison was therefore always true, reducing the intended "skip only if both files are unchanged" check to a state.db-only check and discarding any tick in which only sessions.json changed. The guard now compares against the mtime observed on the previous poll, restoring the intended behavior. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] âœĻ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] â™ŧïļ Refactor (no behavior change) - [ ] ðŸŽŊ New skill (bundled or hub) ## Changes Made - `mcp_serve.py`: in `EventBridge._poll_once`, snapshot `prev_sessions_json_mtime = self._sessions_json_mtime` before refreshing the cached index, and use it in the skip guard (`sj_mtime == prev_sessions_json_mtime`) so a sessions.json-only change no longer triggers the early return. Added a comment explaining the seam. - `tests/test_mcp_serve.py`: added `TestEventBridgePollE2E::test_poll_picks_up_new_conversation_when_only_sessions_json_changed`, a regression test that reproduces the boundary state (state.db unchanged, sessions.json newly updated) and asserts the new conversation's message is emitted. ## How to Test 1. Reproduce the failure on the old code: with the guard comparing against `self._sessions_json_mtime`, the new test fails — the freshly-registered conversation yields `0` events instead of `1`. 2. Apply the fix and run `pytest tests/test_mcp_serve.py -q` — all 46 tests pass (40 skipped require the optional `mcp` SDK), including the three pre-existing `TestEventBridgePollE2E` polling tests and the new regression guard. 3. `ruff check mcp_serve.py tests/test_mcp_serve.py` and `python scripts/check-windows-footguns.py mcp_serve.py` both report clean. ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/test_mcp_serve.py -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A --- mcp_serve.py | 16 ++++++++++-- tests/test_mcp_serve.py | 57 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+), 2 deletions(-) diff --git a/mcp_serve.py b/mcp_serve.py index fdbe6d7b5..4ebda2253 100644 --- a/mcp_serve.py +++ b/mcp_serve.py @@ -356,7 +356,10 @@ class EventBridge: Uses mtime checks on sessions.json and state.db to skip work when nothing has changed — makes 200ms polling essentially free. """ - # Check if sessions.json has changed (mtime check is ~1Ξs) + # Check if sessions.json has changed (mtime check is ~1Ξs). + # Capture the previously-seen mtime *before* refreshing the cache so the + # skip guard below can still tell whether sessions.json changed this tick. + prev_sessions_json_mtime = self._sessions_json_mtime sessions_file = _get_sessions_dir() / "sessions.json" try: sj_mtime = sessions_file.stat().st_mtime if sessions_file.exists() else 0.0 @@ -379,7 +382,16 @@ class EventBridge: except OSError: db_mtime = 0.0 - if db_mtime == self._state_db_mtime and sj_mtime == self._sessions_json_mtime: + # Skip only when NEITHER file changed since the last poll. Comparing + # against ``prev_sessions_json_mtime`` (not the freshly-stored + # ``self._sessions_json_mtime``) is essential: the latter was just set to + # ``sj_mtime`` above, so using it would make the sessions.json term + # always true and collapse the guard to a db-only check. That would + # discard a tick where only sessions.json changed — e.g. a brand-new + # conversation registered after its first message already landed in + # state.db on an earlier tick — and the new chat's messages would never + # be emitted until state.db happened to change again. + if db_mtime == self._state_db_mtime and sj_mtime == prev_sessions_json_mtime: return # Nothing changed since last poll — skip entirely self._state_db_mtime = db_mtime diff --git a/tests/test_mcp_serve.py b/tests/test_mcp_serve.py index 11c3b65b6..0ae403370 100644 --- a/tests/test_mcp_serve.py +++ b/tests/test_mcp_serve.py @@ -1232,6 +1232,63 @@ class TestEventBridgePollE2E: assert len(r2["events"]) == 1 assert r2["events"][0]["content"] == "New reply!" + def test_poll_picks_up_new_conversation_when_only_sessions_json_changed( + self, tmp_path, monkeypatch + ): + """A conversation registered in sessions.json *after* its first message + already landed in state.db must still be picked up, even when state.db's + mtime did not move on this tick. + + Regression guard: the skip-when-unchanged check must compare against the + sessions.json mtime seen on the *previous* poll. If it instead compares + against the just-refreshed value, the sessions.json term is always true, + the guard collapses to a db-only check, and a sessions.json-only change + is silently dropped — the new chat's messages never reach MCP clients. + """ + import mcp_serve + + sessions_dir = tmp_path / "sessions" + sessions_dir.mkdir() + monkeypatch.setattr(mcp_serve, "_get_sessions_dir", lambda: sessions_dir) + + # _poll_once reads /state.db for its mtime gate; the autouse + # fixture points HERMES_HOME at tmp_path. + db_path = tmp_path / "state.db" + db_path.write_text("placeholder") + + session_id = "20260329_150000_late_register" + sessions_json = sessions_dir / "sessions.json" + sessions_json.write_text(json.dumps({ + "agent:main:telegram:dm:late": { + "session_id": session_id, + "platform": "telegram", + "origin": {"platform": "telegram", "chat_id": "late"}, + } + })) + + class DB: + def get_messages(self, sid): + return [{ + "id": 1, "role": "user", + "content": "Hello from a freshly-registered conversation", + "timestamp": "2026-03-29T15:00:00", + }] + + bridge = mcp_serve.EventBridge() + # Simulate the boundary state: state.db has NOT changed since the last + # poll (its message landed on an earlier tick), but sessions.json was + # only updated with this conversation now — the bridge has not yet seen + # the current sessions.json content. + bridge._state_db_mtime = db_path.stat().st_mtime + bridge._sessions_json_mtime = 0.0 + + bridge._poll_once(DB()) + + result = bridge.poll_events(after_cursor=0) + assert len(result["events"]) == 1 + assert result["events"][0]["session_key"] == "agent:main:telegram:dm:late" + assert result["events"][0]["content"].startswith("Hello from a freshly") + def test_poll_interval_is_200ms(self): """Verify the poll interval constant.""" from mcp_serve import POLL_INTERVAL