Second review pass (Codex + Hermes subagent). Codex reproduced a real race with
a two-thread harness; both converged on the remaining issues.
- Generation-aware publish (fixes a lost-update race): two refresh callers (the
late-refresh daemon and the between-turns prologue around turn 1) could each
compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry
generation could acquire the publish lock after a newer caller and clobber it,
deleting just-landed tools. refresh_agent_mcp_tools now captures
registry._generation before computing and refuses to publish a stale set;
agent._tool_snapshot_generation tracks the published generation.
- Context-engine routing names (_context_engine_tool_names) are now staged on a
local and published atomically with the snapshot, and only claimed when this
rebuild actually appended the schema — matching agent_init's dedup so a
registry/plugin tool of the same name keeps its own dispatch. (Previously
mutated live, before the publish lock, and on no-change refreshes.)
- CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a
server newly ENABLED in config mid-session wasn't picked up (TUI already
re-resolved). Merge now-connected MCP server names into the override (unless
the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in
sync. Closes the CLI/TUI parity hole.
- ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th
sibling rebuild that re-injected memory tools but NOT context-engine tools and
bypassed the atomic/name-diff path (inert today, fragile).
- mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG
(single source of truth) instead of a stale hardcoded 5.0 literal.
- Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout
fallback uses DEFAULT_CONFIG.