hermes-agent/tools/daemon_pool.py
Teknium 3f2a56d1a4
fix(cli): reliable interrupts, bounded exit, and exit feedback (#57000)
Three CLI reliability fixes:

1. Interrupt reliability: chat() only re-queued the user's interrupt
   message when the turn result carried interrupted=True. When the agent
   thread raced past its last interrupt check (or finished) before the
   interrupt landed, the message was silently dropped — and the stale
   _interrupt_requested flag left on the agent instantly aborted the
   NEXT turn. Un-acknowledged interrupt messages are now re-queued as
   the next turn and the stale flag is cleared (only when the agent
   thread actually exited). The clarify-race path also parks the message
   in _pending_input instead of dropping it.

2. Slow exit (5+ min): stdlib ThreadPoolExecutor workers are non-daemon
   and joined unconditionally by concurrent.futures' atexit hook — even
   after shutdown(wait=False). One wedged tool worker (abandoned after
   interrupt/timeout) held the process open forever. Promoted
   async_delegation's daemon executor to a shared tools/daemon_pool
   module and adopted it in tool_executor (concurrent tool batches),
   memory_manager (background sync), delegate_tool (child timeout wrapper
   + batch fan-out), and skills_hub (source fan-out). Added a 30s exit
   watchdog (HERMES_EXIT_WATCHDOG_S) armed at _run_cleanup start as a
   backstop for wedged cleanup steps.

3. Exit jank: after prompt_toolkit tears down the input/status bars the
   terminal sat silent for the whole cleanup window, looking hung. Print
   'Shutting down… (finalizing session)' immediately at exit start.

E2E: live PTY interrupt of a foreground 'sleep 120' terminal tool now
aborts in ~1s and the typed message runs as the next turn; wedged-worker
+ wedged-cleanup subprocess exits in 5.8s (watchdog) instead of hanging.
2026-07-02 04:20:43 -07:00

64 lines
2.4 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Shared daemon-thread ThreadPoolExecutor.
Stdlib ``ThreadPoolExecutor`` workers are non-daemon AND are registered in
``concurrent.futures.thread._threads_queues``, whose atexit hook
(``_python_exit``) joins every worker unconditionally — even after
``shutdown(wait=False)``. A single wedged worker (tool blocked on network
I/O, hung provider daemon, stuck subagent) therefore blocks interpreter
exit forever. This is the root cause of multi-minute CLI exits on long
sessions: every abandoned concurrent-tool batch leaves workers that the
exit hook insists on joining.
``DaemonThreadPoolExecutor`` spawns daemon workers and skips the
``_threads_queues`` registration, so:
- ``_python_exit`` never joins them, and
- the interpreter's non-daemon thread join at shutdown skips them.
Semantics are otherwise identical (initializer/initargs, work queue,
idle-thread reuse). Use it for any pool whose work is best-effort or
independently interruptible and must never hold the process open:
concurrent tool execution, background memory sync, catalog fan-out,
subagent timeout wrappers. Do NOT use it for work that must complete
before exit (durable writes) — those belong on foreground threads with
explicit bounded joins.
"""
from __future__ import annotations
import threading
import weakref
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures.thread import _worker
__all__ = ["DaemonThreadPoolExecutor"]
class DaemonThreadPoolExecutor(ThreadPoolExecutor):
"""ThreadPoolExecutor variant whose workers do not block process exit."""
def _adjust_thread_count(self) -> None:
# Mirrors CPython's implementation (3.83.13) with two changes:
# daemon=True and no _threads_queues registration.
if self._idle_semaphore.acquire(timeout=0):
return
def weakref_cb(_, q=self._work_queue):
q.put(None)
num_threads = len(self._threads)
if num_threads < self._max_workers:
thread_name = "%s_%d" % (self._thread_name_prefix or self, num_threads)
t = threading.Thread(
name=thread_name,
target=_worker,
args=(
weakref.ref(self, weakref_cb),
self._work_queue,
self._initializer,
self._initargs,
),
daemon=True,
)
t.start()
self._threads.add(t)