Three CLI reliability fixes: 1. Interrupt reliability: chat() only re-queued the user's interrupt message when the turn result carried interrupted=True. When the agent thread raced past its last interrupt check (or finished) before the interrupt landed, the message was silently dropped — and the stale _interrupt_requested flag left on the agent instantly aborted the NEXT turn. Un-acknowledged interrupt messages are now re-queued as the next turn and the stale flag is cleared (only when the agent thread actually exited). The clarify-race path also parks the message in _pending_input instead of dropping it. 2. Slow exit (5+ min): stdlib ThreadPoolExecutor workers are non-daemon and joined unconditionally by concurrent.futures' atexit hook — even after shutdown(wait=False). One wedged tool worker (abandoned after interrupt/timeout) held the process open forever. Promoted async_delegation's daemon executor to a shared tools/daemon_pool module and adopted it in tool_executor (concurrent tool batches), memory_manager (background sync), delegate_tool (child timeout wrapper + batch fan-out), and skills_hub (source fan-out). Added a 30s exit watchdog (HERMES_EXIT_WATCHDOG_S) armed at _run_cleanup start as a backstop for wedged cleanup steps. 3. Exit jank: after prompt_toolkit tears down the input/status bars the terminal sat silent for the whole cleanup window, looking hung. Print 'Shutting down… (finalizing session)' immediately at exit start. E2E: live PTY interrupt of a foreground 'sleep 120' terminal tool now aborts in ~1s and the typed message runs as the next turn; wedged-worker + wedged-cleanup subprocess exits in 5.8s (watchdog) instead of hanging.
64 lines
2.4 KiB
Python
64 lines
2.4 KiB
Python
"""Shared daemon-thread ThreadPoolExecutor.
|
||
|
||
Stdlib ``ThreadPoolExecutor`` workers are non-daemon AND are registered in
|
||
``concurrent.futures.thread._threads_queues``, whose atexit hook
|
||
(``_python_exit``) joins every worker unconditionally — even after
|
||
``shutdown(wait=False)``. A single wedged worker (tool blocked on network
|
||
I/O, hung provider daemon, stuck subagent) therefore blocks interpreter
|
||
exit forever. This is the root cause of multi-minute CLI exits on long
|
||
sessions: every abandoned concurrent-tool batch leaves workers that the
|
||
exit hook insists on joining.
|
||
|
||
``DaemonThreadPoolExecutor`` spawns daemon workers and skips the
|
||
``_threads_queues`` registration, so:
|
||
|
||
- ``_python_exit`` never joins them, and
|
||
- the interpreter's non-daemon thread join at shutdown skips them.
|
||
|
||
Semantics are otherwise identical (initializer/initargs, work queue,
|
||
idle-thread reuse). Use it for any pool whose work is best-effort or
|
||
independently interruptible and must never hold the process open:
|
||
concurrent tool execution, background memory sync, catalog fan-out,
|
||
subagent timeout wrappers. Do NOT use it for work that must complete
|
||
before exit (durable writes) — those belong on foreground threads with
|
||
explicit bounded joins.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import threading
|
||
import weakref
|
||
from concurrent.futures import ThreadPoolExecutor
|
||
from concurrent.futures.thread import _worker
|
||
|
||
__all__ = ["DaemonThreadPoolExecutor"]
|
||
|
||
|
||
class DaemonThreadPoolExecutor(ThreadPoolExecutor):
|
||
"""ThreadPoolExecutor variant whose workers do not block process exit."""
|
||
|
||
def _adjust_thread_count(self) -> None:
|
||
# Mirrors CPython's implementation (3.8–3.13) with two changes:
|
||
# daemon=True and no _threads_queues registration.
|
||
if self._idle_semaphore.acquire(timeout=0):
|
||
return
|
||
|
||
def weakref_cb(_, q=self._work_queue):
|
||
q.put(None)
|
||
|
||
num_threads = len(self._threads)
|
||
if num_threads < self._max_workers:
|
||
thread_name = "%s_%d" % (self._thread_name_prefix or self, num_threads)
|
||
t = threading.Thread(
|
||
name=thread_name,
|
||
target=_worker,
|
||
args=(
|
||
weakref.ref(self, weakref_cb),
|
||
self._work_queue,
|
||
self._initializer,
|
||
self._initargs,
|
||
),
|
||
daemon=True,
|
||
)
|
||
t.start()
|
||
self._threads.add(t)
|