fix(mcp): stop EventBridge silently dropping sessions.json-only changes
The MCP serve event bridge polls two files to decide whether there is new conversation activity to surface to MCP clients: the gateway sessions.json index and state.db. Its skip-when-unchanged guard was self-defeating — it refreshed self._sessions_json_mtime with the current value *before* comparing against it, so the sessions.json term was always true and the guard collapsed to a state.db-only check. The impact is silent message loss on the event stream. The gateway commonly persists a message to state.db on one tick and registers the owning conversation in sessions.json a moment later. On that later tick only sessions.json has changed, so the broken guard takes the early return and never processes the freshly-registered chat. Its messages are withheld from every connected MCP client (events_poll / events_wait) until state.db happens to change again — which, for an otherwise-idle conversation, may be never. A polling bridge that quietly swallows new conversations is exactly the failure mode this watcher exists to prevent. The fix is minimal and low-risk: capture the previously-seen sessions.json mtime before the cache refresh and compare against that, so the guard skips only when NEITHER file changed since the last poll. The hot-path mtime optimization is fully preserved (a genuinely idle tick still short-circuits), and all existing EventBridge polling tests continue to pass unchanged. ## What does this PR do? Fixes a logic error in `EventBridge._poll_once` (`mcp_serve.py`) where the "nothing changed, skip this poll" guard compared `sj_mtime` against `self._sessions_json_mtime` *after* that attribute had already been overwritten with `sj_mtime`. The comparison was therefore always true, reducing the intended "skip only if both files are unchanged" check to a state.db-only check and discarding any tick in which only sessions.json changed. The guard now compares against the mtime observed on the previous poll, restoring the intended behavior. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made - `mcp_serve.py`: in `EventBridge._poll_once`, snapshot `prev_sessions_json_mtime = self._sessions_json_mtime` before refreshing the cached index, and use it in the skip guard (`sj_mtime == prev_sessions_json_mtime`) so a sessions.json-only change no longer triggers the early return. Added a comment explaining the seam. - `tests/test_mcp_serve.py`: added `TestEventBridgePollE2E::test_poll_picks_up_new_conversation_when_only_sessions_json_changed`, a regression test that reproduces the boundary state (state.db unchanged, sessions.json newly updated) and asserts the new conversation's message is emitted. ## How to Test 1. Reproduce the failure on the old code: with the guard comparing against `self._sessions_json_mtime`, the new test fails — the freshly-registered conversation yields `0` events instead of `1`. 2. Apply the fix and run `pytest tests/test_mcp_serve.py -q` — all 46 tests pass (40 skipped require the optional `mcp` SDK), including the three pre-existing `TestEventBridgePollE2E` polling tests and the new regression guard. 3. `ruff check mcp_serve.py tests/test_mcp_serve.py` and `python scripts/check-windows-footguns.py mcp_serve.py` both report clean. ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/test_mcp_serve.py -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
This commit is contained in:
parent
55c8b2c81f
commit
03bbd37dd7
2 changed files with 71 additions and 2 deletions
16
mcp_serve.py
16
mcp_serve.py
|
|
@ -356,7 +356,10 @@ class EventBridge:
|
|||
Uses mtime checks on sessions.json and state.db to skip work
|
||||
when nothing has changed — makes 200ms polling essentially free.
|
||||
"""
|
||||
# Check if sessions.json has changed (mtime check is ~1μs)
|
||||
# Check if sessions.json has changed (mtime check is ~1μs).
|
||||
# Capture the previously-seen mtime *before* refreshing the cache so the
|
||||
# skip guard below can still tell whether sessions.json changed this tick.
|
||||
prev_sessions_json_mtime = self._sessions_json_mtime
|
||||
sessions_file = _get_sessions_dir() / "sessions.json"
|
||||
try:
|
||||
sj_mtime = sessions_file.stat().st_mtime if sessions_file.exists() else 0.0
|
||||
|
|
@ -379,7 +382,16 @@ class EventBridge:
|
|||
except OSError:
|
||||
db_mtime = 0.0
|
||||
|
||||
if db_mtime == self._state_db_mtime and sj_mtime == self._sessions_json_mtime:
|
||||
# Skip only when NEITHER file changed since the last poll. Comparing
|
||||
# against ``prev_sessions_json_mtime`` (not the freshly-stored
|
||||
# ``self._sessions_json_mtime``) is essential: the latter was just set to
|
||||
# ``sj_mtime`` above, so using it would make the sessions.json term
|
||||
# always true and collapse the guard to a db-only check. That would
|
||||
# discard a tick where only sessions.json changed — e.g. a brand-new
|
||||
# conversation registered after its first message already landed in
|
||||
# state.db on an earlier tick — and the new chat's messages would never
|
||||
# be emitted until state.db happened to change again.
|
||||
if db_mtime == self._state_db_mtime and sj_mtime == prev_sessions_json_mtime:
|
||||
return # Nothing changed since last poll — skip entirely
|
||||
|
||||
self._state_db_mtime = db_mtime
|
||||
|
|
|
|||
|
|
@ -1232,6 +1232,63 @@ class TestEventBridgePollE2E:
|
|||
assert len(r2["events"]) == 1
|
||||
assert r2["events"][0]["content"] == "New reply!"
|
||||
|
||||
def test_poll_picks_up_new_conversation_when_only_sessions_json_changed(
|
||||
self, tmp_path, monkeypatch
|
||||
):
|
||||
"""A conversation registered in sessions.json *after* its first message
|
||||
already landed in state.db must still be picked up, even when state.db's
|
||||
mtime did not move on this tick.
|
||||
|
||||
Regression guard: the skip-when-unchanged check must compare against the
|
||||
sessions.json mtime seen on the *previous* poll. If it instead compares
|
||||
against the just-refreshed value, the sessions.json term is always true,
|
||||
the guard collapses to a db-only check, and a sessions.json-only change
|
||||
is silently dropped — the new chat's messages never reach MCP clients.
|
||||
"""
|
||||
import mcp_serve
|
||||
|
||||
sessions_dir = tmp_path / "sessions"
|
||||
sessions_dir.mkdir()
|
||||
monkeypatch.setattr(mcp_serve, "_get_sessions_dir", lambda: sessions_dir)
|
||||
|
||||
# _poll_once reads <HERMES_HOME>/state.db for its mtime gate; the autouse
|
||||
# fixture points HERMES_HOME at tmp_path.
|
||||
db_path = tmp_path / "state.db"
|
||||
db_path.write_text("placeholder")
|
||||
|
||||
session_id = "20260329_150000_late_register"
|
||||
sessions_json = sessions_dir / "sessions.json"
|
||||
sessions_json.write_text(json.dumps({
|
||||
"agent:main:telegram:dm:late": {
|
||||
"session_id": session_id,
|
||||
"platform": "telegram",
|
||||
"origin": {"platform": "telegram", "chat_id": "late"},
|
||||
}
|
||||
}))
|
||||
|
||||
class DB:
|
||||
def get_messages(self, sid):
|
||||
return [{
|
||||
"id": 1, "role": "user",
|
||||
"content": "Hello from a freshly-registered conversation",
|
||||
"timestamp": "2026-03-29T15:00:00",
|
||||
}]
|
||||
|
||||
bridge = mcp_serve.EventBridge()
|
||||
# Simulate the boundary state: state.db has NOT changed since the last
|
||||
# poll (its message landed on an earlier tick), but sessions.json was
|
||||
# only updated with this conversation now — the bridge has not yet seen
|
||||
# the current sessions.json content.
|
||||
bridge._state_db_mtime = db_path.stat().st_mtime
|
||||
bridge._sessions_json_mtime = 0.0
|
||||
|
||||
bridge._poll_once(DB())
|
||||
|
||||
result = bridge.poll_events(after_cursor=0)
|
||||
assert len(result["events"]) == 1
|
||||
assert result["events"][0]["session_key"] == "agent:main:telegram:dm:late"
|
||||
assert result["events"][0]["content"].startswith("Hello from a freshly")
|
||||
|
||||
def test_poll_interval_is_200ms(self):
|
||||
"""Verify the poll interval constant."""
|
||||
from mcp_serve import POLL_INTERVAL
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue