Follow-up to the salvaged early-exit retry fix (#35617): the debug-browser
launch path was fire-and-forget (stderr to DEVNULL, no logging), so every
platform failure — Windows singleton forward to an existing instance, bad
profile dir, missing shared libraries, policy blocks — collapsed into the
same unactionable 'port 9222 isn't responding yet' message and debug
reports contained nothing.
- launch_chrome_debug() returns a structured ChromeDebugLaunch with
per-candidate attempts (state, exit code, stderr tail)
- browser stderr is captured to <hermes_home>/chrome-debug/launch-stderr.log
- clean exit (code 0) without the port opening is detected as Chromium's
single-instance forward and produces a targeted user hint to close all
running instances of that browser
- crash exits surface the stderr tail (e.g. missing libnspr4.so)
- every spawn/exit is logged to agent.log so hermes debug share captures it
- CLI (/browser connect) and TUI/desktop (browser.manage) both print the hint
Salvaged from PR #3243 by @Mibayy, reimplemented against current main
(the original diff targeted a removed gateway/run.py handler).
- /compact is now a first-class alias of /compress (CLI, gateway,
Telegram/Slack/Discord command lists, autocomplete) — also fixes the
dangling '/compact' references in gateway error messages
(gateway/run.py context-exhausted banners).
- --preview / --dry-run: report what WOULD be compressed (message
counts, token estimate, 'here [N]' boundary) without touching the
transcript. Flags coexist with the existing 'here [N]' / focus-topic
args on both the CLI and gateway surfaces via shared pure helpers in
hermes_cli/partial_compress.py.
- --aggressive (LLM-free hard truncation) is intentionally NOT
implemented: it would need its own transcript-persistence branch
outside the guarded _compress_context rotation machinery (#44794
data-loss class). The flag is recognized and returns an explanatory
message pointing at '/compress here [N]' and /undo instead of being
mis-parsed as a focus topic.
- locales: gateway.compress.aggressive_unsupported added to all 16
catalogs (parity test enforced).
- release.py: AUTHOR_MAP entry for contributor credit.
Three CLI reliability fixes:
1. Interrupt reliability: chat() only re-queued the user's interrupt
message when the turn result carried interrupted=True. When the agent
thread raced past its last interrupt check (or finished) before the
interrupt landed, the message was silently dropped — and the stale
_interrupt_requested flag left on the agent instantly aborted the
NEXT turn. Un-acknowledged interrupt messages are now re-queued as
the next turn and the stale flag is cleared (only when the agent
thread actually exited). The clarify-race path also parks the message
in _pending_input instead of dropping it.
2. Slow exit (5+ min): stdlib ThreadPoolExecutor workers are non-daemon
and joined unconditionally by concurrent.futures' atexit hook — even
after shutdown(wait=False). One wedged tool worker (abandoned after
interrupt/timeout) held the process open forever. Promoted
async_delegation's daemon executor to a shared tools/daemon_pool
module and adopted it in tool_executor (concurrent tool batches),
memory_manager (background sync), delegate_tool (child timeout wrapper
+ batch fan-out), and skills_hub (source fan-out). Added a 30s exit
watchdog (HERMES_EXIT_WATCHDOG_S) armed at _run_cleanup start as a
backstop for wedged cleanup steps.
3. Exit jank: after prompt_toolkit tears down the input/status bars the
terminal sat silent for the whole cleanup window, looking hung. Print
'Shutting down… (finalizing session)' immediately at exit start.
E2E: live PTY interrupt of a foreground 'sleep 120' terminal tool now
aborts in ~1s and the typed message runs as the next turn; wedged-worker
+ wedged-cleanup subprocess exits in 5.8s (watchdog) instead of hanging.
compress_context() and /new already flush un-persisted messages before
calling end_session() (fixed in #47202), but /resume and /branch still
call end_session() directly. When a turn is interrupted mid-flight and
the user immediately runs /resume or /branch, messages generated during
that turn have not yet been written to state.db and are silently lost on
session rotation.
Add the same best-effort _flush_messages_to_session_db() call before
end_session() in both _handle_resume_command and _handle_branch_command,
mirroring the pattern established in cli.py:new_session().
Regression tests verify the flush is called when an agent is present.
hermes -w locks each worktree (reason 'hermes pid=<pid>'). git worktree
remove --force (single -f) refuses a locked tree, so a crashed session's
lock was never released and its worktree accumulated forever — a real
contributor to .worktrees/ bloat.
_prune_stale_worktrees now classifies each lock via _worktree_lock_is_live:
a live-owner pid is skipped at any age; a dead-owner (or foreign) lock is
unlocked first so the aggressive age-based cleanup can actually reap it.
The >72h reap tier is kept (that cleanup is intentional) but now guarded so
dirty/unpushed work is preserved, and branch deletion is gated on
git worktree remove succeeding. New fail-safe helpers _worktree_is_dirty
and _worktree_lock_is_live (pid liveness via gateway.status._pid_exists,
Windows-safe).
The streaming think-tag suppressors in cli.py (_stream_delta) and
gateway/stream_consumer.py (_filter_and_accumulate) matched tag names
with case-sensitive str.find(), so only the exact-case literals in the
tag tuples were caught. Mixed-case variants a model may emit — <Think>,
<ThInK>, <REASONING>, <Thought> — slipped through and leaked raw
reasoning into the user-visible stream.
Match against a lowercased view of the buffer with lowercased tag names
at all three sites (open-tag boundary search, partial-tag hold-back,
close-tag search) in both paths. Only KNOWN tag names are matched — no
substring matching — and the block-boundary gating that protects prose
mentions of <think> is preserved.
- 6 parametrized case-insensitive regression tests in each of
tests/gateway/test_stream_consumer.py and
tests/cli/test_stream_delta_think_tag.py.
Salvaged from PR #27289 by @YLChen-007.
Bare print() output is swallowed by patch_stdout while an interactive
prompt_toolkit Application owns the terminal, so /sessions and /history
rendered nothing. Route those emissions through _cprint (prompt_toolkit's
native renderer) when an app is running, and fall back to print otherwise.
Fixes#36815
The CLI routes user input typed while the agent is running into
``_interrupt_queue`` (separate from ``_pending_input``) so the explicit
interrupt path can opt to deliver them as a single combined message.
That path only drains the queue when ``busy_input_mode == "interrupt"``
AND a ``pending_message`` was acknowledged.
If the agent's turn finishes naturally (no interrupt fires), any
messages typed during the turn stay stuck in ``_interrupt_queue``
forever. Subsequent ``Enter`` presses route input to the same blocked
queue and the CLI appears to hang. Original report: lunarnexus in
The fix restores the post-turn drain that was originally part of
drain off as "worth its own review" and never re-landed it; the user-
visible regression is that any non-interrupt-mode user typing during
a turn is silently dropped.
Implementation: extract the drain to a small helper
``_drain_interrupt_queue_to_pending_input`` matching the existing
``_maybe_continue_goal_after_turn`` style. ``process_loop``'s
``finally`` block calls it once per turn after the status-line refresh
and before goal continuation (so re-queued user input preempts an
auto-continuation prompt). The helper swallows ``Exception`` so it
can never break the main loop.
Addresses #20271.
Interrupting the agent while an approval/clarify/sudo/secret prompt is up
left the overlay state dict set with no thread servicing it. The prompt's
worker thread is torn down on interrupt, but read_only (gated on
_command_running) plus the keypress filter kept the CLI input locked until
the prompt's own timeout expired — the terminal appeared frozen.
Drain and clear all four input-blocking overlays on interrupt via a single
helper (_clear_active_overlays_for_interrupt): approval -> deny,
clarify/sudo/secret -> cancel, each guarded so a dead queue can't block the
others; sudo restores the pre-modal draft. Wired into all three interrupt
paths — new-message interrupt, Ctrl+C, and Ctrl+Q. Blocking overlays now
clear AND fall through so one keypress both clears a stale overlay and
interrupts a still-running agent; the /model picker and slash-confirm
foreground prompts keep their cancel-and-return behavior.
Closes#13618.
test_background_task_registers_thread_local_approval_callbacks polled a
2s wall-clock deadline waiting for the background daemon thread to pop
its entry from _background_tasks. Under loaded CI the thread's
finally-block cleanup could lag the deadline, flaking the final
'assert not cli._background_tasks'. Join the actual worker thread
(timeout=10) so the wait ends exactly when the thread finishes.
test_spinner_elapsed_format_is_fixed_width_to_reduce_wrap_jitter derived
_tool_start_time from the live time.monotonic() clock (now - 65.2 / now - 9.2).
monotonic()'s epoch is arbitrary — on a host where monotonic() < 65.2 (fresh
subprocess on a freshly-booted CI runner) the start time went negative, the
(t0 > 0) guard in _render_spinner_text() dropped the '(elapsed)' suffix, and
short.split('(',1)[1] raised IndexError: list index out of range. Deterministic
given a small clock, so it would keep flaking, not clear on rerun.
Pin time.monotonic to a fixed 1000.0 and offset _tool_start_time from it so both
the <60s and >=60s paths always render the elapsed suffix regardless of the
runner's monotonic epoch.
Pre-existing main flake (surfaced in CI test slice 1/8).
A custom_providers config that names the model under model.name (or
model.model) resolved to an empty model, so the API request went out
with model= — HTTP 400 from OpenAI-compatible backends. Display paths
(hermes status/dump) already read model.name and showed the model,
making the failure silent.
The model id was read via 'default or model' at ~14 independent sites
(cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none
of which honored 'name'. Rather than patch every site, canonicalize at
the single load/save chokepoint: _normalize_root_model_keys() now
promotes model.model/model.name -> model.default (precedence
default > model > name) and drops the stale alias, so every reader —
present and future — sees a populated default and config.yaml is
migrated canonical on next save. The gateway, which bypasses
load_config(), replays the same normalization in _load_gateway_config().
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
Credit: root-cause analysis and fix direction from @Bartok9 (#34502,
first) and @v86861062 (#34527).
Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw)
out of process_loop's finally block into _recover_terminal_after_interrupt(),
and replace the inline-logic-copy tests with ones that exercise the real
helper plus a source guard that process_loop still invokes it behind the
_last_turn_interrupted gate.
When the agent is interrupted during processing, prompt_toolkit's
renderer and VT100 input parser can be left in an inconsistent state.
CSI 6n cursor position report responses leak as literal text
(^[[19;1R) and the terminal stops accepting keyboard input.
Fix: in process_loop's finally block, after an interrupted turn:
- flush_stdin() to drain stray escape bytes from the OS input buffer
- _force_full_redraw() to reset prompt_toolkit's renderer cache
Closes#33271
When a MoA preset is selected, each reference model's answer now renders in the
CLI as a thinking-style block labelled with its source model, BEFORE the
aggregator responds — so the mixture-of-agents process is visible instead of a
silent pause. The aggregator's response (and its tool actions) follow as normal.
Mechanism (shared seam, all surfaces):
- MoAChatCompletions/MoAClient take an optional reference_callback and emit
'moa.reference' (index/count/label/text) per reference, then 'moa.aggregating'
(aggregator label) once. agent_init wires this to the agent's
tool_progress_callback, which every surface already consumes — so the events
reach CLI/TUI/desktop/gateway with no new plumbing.
- CLI _on_tool_progress renders 'moa.reference' as a labelled '┊ ◇ Reference
i/n — <model>' header + a thinking-style preview (reusing _emit_reasoning_
preview), and 'moa.aggregating' as a spinner transition. Display-only; never
touches message history (cache-safe).
Turn-scoped reference cache: the agent loop calls the facade once per tool-loop
iteration, but the advisory message view is identical across iterations within a
turn, so references are now run AND displayed once per user turn (keyed by the
advisory view's signature) instead of re-running/re-spamming on every iteration.
This also cuts reference API cost from O(iterations) back to O(turns).
Verified live via interactive PTY on the opus-gpt preset (gpt-5.5 + opus refs):
reference blocks render once per turn, labelled by model, before the aggregator;
fresh blocks on each new turn; aggregator tool actions still execute.
Follow-up: TUI/desktop rich rendering + gateway batched-summary already receive
the events via tool_progress_callback; their surface-specific renderers are a
separate change.
In the interactive CLI, the aggregator's tool calls under a MoA preset (or
any non-streaming model call, e.g. copilot-acp) appeared to overwrite each
other instead of building scrollable history. Each tool only updated the
transient spinner line; no committed scrollback line was printed.
Root cause: persistent tool lines in _on_tool_progress's tool.completed
branch were gated on tool_progress_mode in {all, new}, omitting 'verbose'.
Streaming models hid the bug because _on_tool_gen_start commits a 'preparing'
line per tool during streaming; non-streaming calls (MoA forces
_use_streaming=False) never emit that, so under 'verbose' there was no
committed line at all — only the self-overwriting spinner.
'verbose' is strictly more than 'all', so it now commits the same scrollback
line. Verified live via interactive PTY on the MoA opus-gpt preset: three
terminal calls in turn 1 and two in turn 2 each render as separate persistent
lines.
prompt_toolkit's renderer sends ESC[6n cursor-position queries before
painting in non-fullscreen mode; the terminal replies ESC[<row>;<col>R.
Over SSH/cloudflared tunnels and slow PTYs these replies race past the
input parser and land in the display as raw '20;1R21;1R' text, and the
pending-CPR future can stall the renderer so the prompt freezes after the
agent's final answer.
Build the prompt_toolkit output with enable_cpr=False so CPR is marked
NOT_SUPPORTED up front and ESC[6n is never sent. This is the root-cause
counterpart to the existing input-side _strip_leaked_terminal_responses
scrubbing. Vt100_Output.from_pty() does not expose enable_cpr in
prompt_toolkit 3.x, so _build_cpr_disabled_output() reproduces its
get_size setup and calls the constructor directly; it returns None on any
failure so startup falls back to the default output.
Verified in a real PTY: baseline emits 1 ESC[6n query, the fix emits 0,
banner/UI render identically. Layout is unaffected — with CPR off the
renderer sizes the prompt to its preferred height (the same fallback
prompt_toolkit uses on any terminal that doesn't answer CPR).
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.
Also fixes#53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.
Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
A top-level delegate_task dispatches in the background and re-enters as a
fresh turn when done. Print a one-line dispatch-time note — no spinner,
nothing to poll — so the idle prompt doesn't read as "nothing happened."
* feat(moa): expose MoA presets as selectable virtual models
Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.
Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.
- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
/ one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.
* fix(moa): thread moa_config through _run_agent to _run_agent_inner
The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).
* fix(moa): classify moa as a virtual provider in the catalog
The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
- test_provider_catalog: api_key providers must expose a credential env var
- test_provider_parity: every hermes-model provider must be desktop-configurable
moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
The classic prompt_toolkit status bar already shows two background
indicators: ▶ N (/background agent threads) and ⚙ N (shell processes
spawned by terminal(background=true)). Background/async subagents
(delegate_task batches and background single delegations) had no
indicator despite being long-running work the user should be able to
see at a glance.
Add a third indicator ⛓ N sourced from
tools.async_delegation.active_count() — the count of delegations still
in the 'running' state. Renders in the plain-text builder and the
styled-fragment builder across the same width tiers as the other two
(omitted on the narrow <52 tier), guarded so a raising active_count()
leaves the snapshot at 0.
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process
The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.
/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.
Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.
* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)
The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.
* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait
Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
{"verdict":"wait","wait_on_pid":N} → park until that process exits
{"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.
* test(goals): update kanban _fake_judge to the 4-tuple judge contract
CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.
* feat(goals): trigger-based wait — park on a process's own signal, not just exit
Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.
Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.
Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers
Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632
Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.
* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed
The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
A bare custom provider configured via `model.api_base` (the intuitive name
OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set`
accepts any dotted key, so `model.api_base` got written and confirmed, but the
runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter
with an empty key -> 401, zero hits to the custom endpoint (issue #8919).
Now api_base is migrated to base_url at load time (fixes existing broken
configs) and at set time (with a notice), never overriding an explicit
base_url. Closes#8919.
hermes -w created the worktree branch from the standalone clone's HEAD, which
lags origin when the clone isn't freshly updated (it's only refreshed by
hermes update, not per session). Every worktree branch then rooted on a stale
base, so the PR diff GitHub computes against current main ballooned with
unrelated changes and the agent had to discover the staleness at push time and
rebase.
_resolve_worktree_base() now fetches and branches from the freshest available
ref: the current branch's upstream if it tracks one (so a deliberate
feature-branch worktree tracks its own remote), else the remote's default
branch (origin/HEAD), else local HEAD as a fail-soft fallback (offline / no
remote / detached). A bogus 'origin/(unknown)' default is guarded, and worktree
creation retries from HEAD if branching off the remote ref fails — so this is
never worse than the old behavior.
Gated by worktree_sync (default true); set worktree_sync: false to keep the
old branch-from-local-HEAD behavior. The resolved base is printed in the
session banner.
This is the follow-up to the #50319 session, where the standalone clone was
213 commits behind origin and the worktree inherited that stale base.
Render the reactive pet pane in the classic CLI (steady redraw,
right-aligned) and wire the /pet command to list and switch pets, plus an
enable/disable toggle. Backed by hermes_cli/pets.py and the CLI commands
mixin, registered in the central command registry. Covered by the CLI pet
pane and toggle tests.
The god-file Phase 4 refactor (094aa85c37) moved agent construction into
CLIAgentSetupMixin, which set the atexit shutdown reference with a bare
`global _active_agent_ref`. After extraction that global binds the *mixin
module's* namespace, not cli.py's. cli._run_cleanup reads
cli._active_agent_ref to decide whether to fire the memory provider's
on_session_end hook — and it stayed None for the whole session, so the
`if _active_agent_ref:` branch was dead and on_session_end never ran on
/exit. Custom memory providers silently lost end-of-session extraction.
Fix: publish the reference onto the cli module explicitly
(`import cli as _cli; _cli._active_agent_ref = self.agent`), using the
deferred-import pattern already established in the mixin.
Regression test asserts cli._active_agent_ref is populated by the mixin's
publish line and guards against a relapse to the bare `global` form. The
existing shutdown tests passed only because they hand-assigned the ref,
which is exactly what masked this.
The classic CLI status bar could appear twice after a horizontal terminal
resize — two bars at two widths with two different elapsed readings.
Root cause: prompt_toolkit's Application._on_resize() calls renderer.erase(),
which does cursor_up(_cursor_pos.y) + erase_down() using the _cursor_pos.y
cached from the LAST render at the OLD width (renderer.py:745). On a column
shrink the terminal reflows the already-painted full-width chrome into extra
physical rows, so the cached y undershoots: cursor_up doesn't climb past the
reflowed rows and erase_down leaves the old bar stranded ABOVE the live
origin. The next paint stacks a fresh bar below it. The existing post-resize
suppression hides the NEW bar for ~0.35s but never erases the already-reflowed
OLD one, so the ghost survives the whole window. Ctrl+L / /redraw clears it,
confirming a viewport wipe is the fix.
Fix: on a WIDTH change, _recover_after_resize now routes through the same
recovery as Ctrl+L — _clear_prompt_toolkit_screen(rebuild_scrollback=False)
(CSI 2J, visible viewport only) + _replay_output_history() — BEFORE delegating
to prompt_toolkit's resize. Banner-safe: 2J never touches scrollback history
(that's CSI 3J, which we don't send here), so the startup banner is preserved.
Rows-only resizes skip the clear (no reflow → no ghost) to avoid an extra
repaint. Tracks _last_resize_width to distinguish the two.
Tests: replace the now-obsolete 'never clears on resize' assertion with two
tests — rows-only resize delegates without clearing; width change clears the
viewport + replays and never wipes scrollback.
The classic CLI status bar could vanish for the rest of a session: any
terminal reflow (SIGWINCH from a tmux pane change, SSH window restore, font
zoom) set _status_bar_suppressed_after_resize=True, but the flag was ONLY
cleared on the next *submitted* user input. Resize then sit idle and the
bottom chrome rendered at height 0 on every repaint — even with the
refresh clock ticking — so the bar was gone until you typed and hit enter.
Fix: _recover_after_resize now schedules a debounced unsuppress timer that
clears the flag and repaints once the reflow settles (~0.35s), so the bar
returns on its own during idle. The next-submit clear stays as a fast path.
Fails open: any error in scheduling clears the flag immediately rather than
leaving the bar stuck hidden.
* feat(billing): nous_billing http client + BillingState core (phase 2b)
Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
(state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
parsing (server emits decimal strings, not 2dp), fail-open builder,
idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
matrix, fail-open, amount validation).
Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md
* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)
- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
billing:manage, reusing the held credential's portal/inference URLs + client_id
(so a preview stays a preview), persists like _login_nous but WITHOUT the model
picker. Returns True iff the minted token carries the scope (False when NAS
silently downscopes a non-admin / unticked grant).
Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.
* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)
billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.
* feat(billing): /billing CLI handler + command registry (phase 2b)
- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
_SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
_prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.
* feat(billing): /billing Ink TUI screens + tests (phase 2b)
- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
(D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
settled/no_payment_method/step-up/limit/auto-reload/validation).
TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.
* docs(billing): scrub private cross-repo references
NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
generic 'the server' / 'a preview deployment'
* docs(billing): scrub final NAS reference in step-up docstring
* docs(billing): drop dangling plan-doc refs
The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.
* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes
Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.
- TUI: full /billing overlay state machine (overview to buy to confirm,
auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
via the TUI's own opener (the device flow runs in the headless gateway, so a
printed URL was being dropped); run the step-up handler off the main loop and
emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
/state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
(the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.
Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.
* docs(billing): add /billing TUI screenshots for PR
* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test
The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.
The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
Modal prompt panels (dangerous-command approval, clarify questions)
live in the prompt_toolkit layout and vanish on the next repaint,
leaving no trace of the question or the decision in chat history.
Emit a dim one-line summary after each prompt resolves:
⚠ Approval: <command> → allowed for session
? Clarify: <question> → <answer>
Gated on display.persist_prompts (default true). Detail and outcome
are whitespace-collapsed and capped at 120 chars.
Adds an idle clock to the context/status bar in both the prompt_toolkit CLI
and the Ink TUI: once a turn completes, a dim '✓ <elapsed>' segment shows how
long the session has been idle since the last final agent response. Hidden
while a turn is live (the per-prompt elapsed timer covers that) and before
the first turn completes.
- cli.py: track _last_turn_finished_at when the agent thread exits, surface
it via _format_idle_since() in the snapshot, render in both the wide
fragments path and the plain-text fallback.
- ui-tui: stamp lastTurnEndedAt when busy flips false after a live turn,
thread it through appStatus -> StatusRule, render via a ticking IdleSince
segment sharing the duration breakpoint/width budget.
The existing #33961 tests mock _prompt_text_input away, so they only assert
modal-vs-stdin routing — they cannot observe the actual hang. Add a guard
class that drives the real helper chain with a blocking input() on a win32
daemon thread and asserts the worker never hangs. Fails on the pre-#33961
code (win32 -> _prompt_text_input -> off-main input() -> deadlock), passes
on the modal path. Also covers the scheduling-failure degraded branch
(must clean-cancel to None, never call input()).
The four win32 tests asserted the old deadlocking behavior (win32 -> raw
input()). Rewrite them to the corrected contract: native Windows uses the
modal via the app loop, and stdin is kept only for the safe no-app /
scheduling-failure cases. Consolidate three near-identical daemon-thread
tests into one parametrized (linux/win32) test behind a shared _run_on_daemon
harness, and drop dead code from the old main-thread test.
Refs #33961
Lift the `_handle_*_command` cluster (2,077 LOC) out of HermesCLI into
hermes_cli/cli_commands_mixin.py; HermesCLI now inherits CLICommandsMixin so
every self.<handler> call resolves unchanged via the MRO. Behavior-neutral.
Import discipline mirrors gateway/slash_commands.py (PR #41886): neutral deps
imported at the mixin module top level; cli.py-internal helpers/constants
(_cprint, _ACCENT, save_config_value, ...) imported lazily inside each handler
via 'from cli import ...' so the mixin never imports cli at module scope.
cli.py 16215 -> 14139 LOC. One test mock repointed (cli.is_browser_debug_ready
-> hermes_cli.cli_commands_mixin.is_browser_debug_ready).
In classic CLI mode the dangerous-command approval prompt (and the clarify,
sudo, and secret-capture prompts) could fail to render: the user saw
'⏱ Timeout — denying command' after 60s without ever seeing the panel,
making approvals.mode: manual unusable.
Root cause. These prompts run their wait loop on the agent/background thread:
they set modal state that a ConditionalContainer's filter reads, then call
self._invalidate() to repaint so the panel appears. _invalidate() is a
THROTTLED wrapper built for high-frequency background repaints (spinner frames,
streaming) — it (a) returns early while a SIGWINCH resize-recovery is pending,
and (b) otherwise only repaints if 250ms elapsed since the last paint. Under
either condition the modal's entry paint is silently dropped, the
ConditionalContainer never re-evaluates, and the prompt times out unseen.
The throttle never belonged on these paths. Originally the callbacks painted
with a direct self._app.invalidate() and worked; a throttle PR blanket-replaced
every invalidate (including these rare, one-shot, user-blocking modal paints)
with the throttled _invalidate(); a later commit removed an idle 1Hz repaint
that had been masking dropped modal paints, surfacing the bug. Notably the
modal KEY-BINDING handlers (↑/↓/Enter) already paint with a direct
event.app.invalidate(), never the throttle — the background-thread callbacks
were the inconsistent ones.
Fix. Add a small _paint_now() helper that paints directly (guarded for a
missing _app, exception-safe) and route the four modal paths' entry, response,
countdown, and teardown paints through it — matching the key-handler idiom.
This covers approval, clarify, sudo, and the secret-capture teardown
(_submit_secret_response, which previously used the throttled _invalidate() so
its panel could linger after submit). _invalidate() is left untouched and its
docstring now states it is for high-frequency background repaints only;
modal/interactive paints must use _paint_now()/_app.invalidate() directly. This
also fixes the resize-recovery edge case for free (a direct paint never
consults the resize guard) without a throttle-bypass flag that could be
cargo-culted onto hot paths. Countdown refresh cadence tightened 5s->1s so the
timer stays visible while waiting, and a copy-pasted duplicate countdown block
in _clarify_callback is removed.
Tests: TestModalPaintNow drives all three wait-loop callbacks on a background
thread with BOTH gates active (_resize_recovery_pending=True + a recent
_last_invalidate in the throttle window) and asserts the panel paints on entry
AND repaints on teardown; plus a secret-teardown test, a direct
_paint_now-vs-_invalidate gate test, and a no-_app safety test. Each modal test
fails if its paint is reverted to _invalidate(). 17 in-file tests pass; full
tests/cli suite green (900).
Diagnosis credit: the throttle-drop root cause was identified by @sanidhyasin
in #41116; @islam666 independently reached the same direct-invalidate approach
in #41166; original report #41098 by @jodonnel.