The renderer kept a braille canvas, char-field scene, star-glyph/orbital
helpers, and seed/links params from earlier visual iterations that the
final timeline bar chart never uses. Remove them (~190 lines), simplify
the empty-state placeholder, and refresh the module + RPC docstrings to
describe what actually ships.
A Kanban task referencing a non-existent skill (e.g. a typo'd name)
crashed the worker on startup via ValueError, which the dispatcher
retried until the task auto-blocked. Both cli.py and tui_gateway/server.py
now skip the unknown skill(s), log a warning, and continue with whatever
loaded — but still hard-fail when EVERY requested skill is missing, so a
fully-misconfigured worker fails loudly instead of running blind.
Closes#27136
Co-authored-by: Jimmy Johansson <jimmyjohansson84@users.noreply.github.com>
session.info is only ever an emitted event (_emit), never a dispatched
@method RPC, so listing it in _LONG_HANDLERS is dead weight that can
never match a dispatched method name. Remove it from the set and the
test's frontend-polled list to keep _LONG_HANDLERS to real RPCs.
Frontend-polled read-only RPCs (session.list, pet.info, process.list)
ran inline in the WS read loop. Under GIL pressure from concurrent agent
turns they block the loop, timing out frontend polls and surfacing as a
false "needs setup" / dropped session (#50005, #48445). Route them
through _LONG_HANDLERS so dispatch() returns immediately, and raise the
default RPC pool to 8 workers so the added long handlers don't queue.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
Three targeted fixes for Desktop GUI WebSocket stability when agent
turns starve the uvicorn event loop of CPU (GIL contention):
1. Loosen ws_ping_timeout for loopback binds (QW-1)
- Loopback (Desktop): ping 30s interval / 60s timeout
- Non-loopback (Cloudflare Tunnel): unchanged 20/20
- A GIL-heavy agent turn can stall the event loop past 20s;
uvicorn's keepalive ping runs on that same starved loop, so a
20s timeout kills an otherwise-healthy local connection over a
recoverable stall. 60s rides out the stall without affecting
half-open detection on public binds.
2. Coalesce streaming token frames in WSTransport (CF-2)
- Buffer high-frequency delta frames (message.delta, reasoning.delta,
thinking.delta) and flush as a batch every ~33ms (~30fps)
- Non-streaming frames (RPC responses, control/tool/completion events)
flush pending tokens first — wire ordering preserved
- Thread-safe via threading.Lock; worker threads return immediately
instead of blocking on per-token loop wakeups
- Reduces event-loop wakeup churn by orders of magnitude during model
streaming, directly cutting GIL pressure
3. Loop heartbeat watchdog (CF-1)
- Self-rearming call_later tick (2s) measures drift between expected
and actual fire time using loop.time() (monotonic)
- Logs 'event loop stalled Ns (GIL pressure suspected)' when drift >5s
- Turns mysterious WS drops into diagnosable log entries
- Uses call_later chain (not a task) — dies with the loop, nothing
to cancel on shutdown
Root cause: uvicorn's ws keepalive ping (20/20s) runs on the same
starved event loop as agent turns. Under GIL pressure from heavy agent
turns or delegation, the loop can't service the ping within 20s, so
the websockets protocol declares the connection dead. Reconnects fail
with ready_send_failed because the old process's loop is still wedged.
None of these fixes touch the model-facing message array, prompt
caching, message role alternation, or the wire protocol — they are
strictly display-transport improvements plus a config tweak and a
diagnostic log.
Tests: 762 passed, 17 skipped (0 failures) across test_tui_gateway_ws,
test_tui_gateway_server, test_web_server, and tui_gateway/ suites.
Terminal rendition of the desktop Star Map / Memory Graph: learned skills
and memories on a timeline, shared by `hermes journey` and the TUI
`/journey` overlay via one size-aware Python renderer
(agent/learning_graph_render.py).
- TUI overlay mirrors /agents: static chart overview + selectable slice
list → slice detail → single skill/memory body, with the shared
inverse-row selection treatment and a pinned footer.
- Reuse primitives: extract OverlayScrollbar into its own module (now
shared with agentsOverlay), scroll the item body via ScrollBox, and
unify both lists through one table-driven ListRow.
- No animation/playback in the TUI — pure data; the renderer's reveal
scrubber stays available in the CLI (`--play`, `--reveal`).
MCP tools connected and enabled but never surfaced into the agent's
session toolset on the desktop app + dashboard WebUI (#51587).
There are two independent background MCP discovery thread owners by
surface: tui_gateway.entry (stdio 'hermes --tui') and hermes_cli.mcp_startup
(desktop app + dashboard WS sidecar via tui_gateway/ws.py, and 'hermes
dashboard'). The late-refresh scheduler gates on
tui_gateway.entry.mcp_discovery_in_flight(), which read ONLY the entry
thread global. On the desktop/dashboard surfaces that global is None, so a
server slower than the bounded build-time wait never triggered a late
refresh and its tools stayed invisible for the whole session.
Make mcp_discovery_in_flight() / join_mcp_discovery() consult BOTH thread
owners. Adds the matching in-flight/join helpers to hermes_cli.mcp_startup
and has tui_gateway.entry delegate to them as a second owner.
* feat(display): friendly human-phrased tool labels for built-in tools
Built-in tools now render ChatGPT-style status verbs ('Searching the web
for ...', 'Reading <file>', 'Browsing <url>') on the CLI spinner and
gateway/desktop tool-progress instead of the raw tool name.
- agent/display.py: _TOOL_VERBS map + build_tool_label() + set/get
friendly-labels flag (default on). Custom/plugin/MCP tools fall back to
the raw preview; verbose gateway mode left untouched (debug surface).
- tool_executor.py / tui_gateway / gateway: route the three spinner sites,
the TUI _tool_ctx, and the gateway all/new progress line through the label.
- config: display.friendly_tool_labels (default True, per-platform aware).
Zero new core tool / schema footprint — pure display layer.
* docs: add PR infographic for friendly tool labels
* fix(display): preserve arg preview in gateway friendly labels + update tests
The first gateway pass re-derived the label from the callback's `args`, which
is empty ({}) at the gateway tool.started callsite — the command/query lives in
the `preview` string, so terminal rendered as a bare '💻 Running' and dedup
collapsed consecutive commands. Now the gateway prefixes the verb onto the
already-computed preview via get_tool_verb/tool_verb_connector/verb_drops_preview,
preserving the command/url/query. CLI spinner path (real args) keeps build_tool_label.
Tests: update test_run_progress_topics exact-format assertions to the friendly
form ('💻 Running pwd'), add a format-agnostic preview extractor for the
truncation tests (works for both quoted-legacy and verb-prefixed output).
* test(tui): update resume-display context to friendly tool label
_tool_ctx now uses build_tool_label, so the desktop resume-view context for a
search_files turn reads 'Searching files for resume' instead of the bare
'resume' preview — consistent with live tool-progress. Update the assertion.
* test(tui): harden no-race worker test against sibling shard leakage
test_session_create_no_race_keeps_worker_alive flaked under -j 8: a daemon
build thread leaked from a prior session.create test in the same shard process
fires close/unregister against its own (foreign) session_key after this test
patches the global approval hooks, polluting the captured lists. Scope the
assertions to this session's own session_key so the regression intent
(this session's worker/notify must survive) is preserved while the test
becomes immune to shard composition. Not related to friendly-tool-labels.
Let users click the status bar context indicator to see how tokens are
split across system prompt, tools, rules, skills, MCP, and conversation.
Co-authored-by: Cursor <cursoragent@cursor.com>
Register read-only agent terminals with the same renderer-side terminal reader
as user terminals so read_terminal works on whichever tab is active.
Also bring agent xterm rendering closer to user-terminal parity (unicode 11,
web links, font weights/spacing) and make the gateway sink wiring resilient if
only one terminal event sink was already installed.
Make the read-only agent terminal mirrors stream in real time and give
the agent a desktop-only way to dismiss its own tabs.
- Stream background output live: the local reader used a blocking
read(4096) that buffered small periodic output until EOF, so agent
tabs only "filled in" at process exit. Switch to buffer.read1(4096)
(decoded) for incremental chunks.
- Route agent.terminal.output / terminal.close to the window that owns
the process (its gateway session) instead of an empty session id, so
events actually reach the desktop renderer.
- Add close_terminal: a HERMES_DESKTOP-gated tool (sibling of
read_terminal) that drops a process's read-only tab WITHOUT killing it
via process_registry.on_close; output keeps buffering and the user can
reopen from the status stack.
- ⌘W now closes a focused agent tab: mark the agent instance
data-terminal and focus it on activation so isFocusWithin routes there.
- ensureTerminal() no longer spawns an extra user shell when a tab
already exists (e.g. opening a background task from the status stack).
Replace the 5s output_tail poll (which often showed nothing) with a real push
stream. The process registry gains an on_output sink called from its reader
threads with each chunk; the tui_gateway wires it to emit agent.terminal.output
{process_id, chunk} (write_json is _stdout_lock-guarded, so emitting from the
reader thread is safe). The desktop routes chunks by process id straight into
the read-only agent xterm via a small writer registry, with a capped backlog so
a tab opened mid-stream (or reopened) replays what it missed.
Drops the fragile poll/tail path: no session-key matching, no truncation, no
lag — full-fidelity ANSI, env-agnostic (local/docker/ssh).
server.py's PDF-attach handler shells out to `pdftoppm` from the
console-less desktop/gateway backend; on Windows that pops a conhost
window each attach. Route it through windows_hide_flags() like the
sibling _list_repo_files git calls (no-op on POSIX).
The Windows desktop GUI runs its backend headless via pythonw.exe. Several
auxiliary subprocess sites that run inside that windowless backend spawned
console-subsystem children (git, gh, wmic, powershell, bash, rg, taskkill)
WITHOUT CREATE_NO_WINDOW, so Windows allocated a fresh conhost per call and
flashed a black window on screen — sometimes continuously (the dashboard
Projects-tree git probe alone fired ~118 spawns in 60s on startup).
The terminal tool, cron, browser, code_execution, and gateway-spawn paths
already carry windows_hide_flags(); these auxiliary probe/scan/launcher legs
were missed. Wire the existing helper into them:
- tui_gateway/git_probe.py: run_git (+ encoding=utf-8/errors=replace, fixes the
cp950 UnicodeDecodeError on CJK paths from the same site)
- agent/coding_context.py: _git (per-turn git status/log/diff)
- agent/context_references.py: _run_git + _rg_files (@file/@ref resolution)
- hermes_cli/copilot_auth.py: gh auth token probe (auxiliary provider:auto)
- hermes_cli/gateway.py: wmic + PowerShell Get-CimInstance PID scan
- hermes_cli/main.py: wmic stale-dashboard PID scan
- gateway/status.py: taskkill /T /F force-kill
windows_hide_flags() returns 0 on POSIX, so every changed call is a no-op on
Linux/macOS (verified: real git/rg probes still work; Windows-simulated calls
all pass creationflags=CREATE_NO_WINDOW).
Scoped to the windowless-backend paths that cause the reported flashing. The
Electron updater-handoff leg (main.cjs windowsHide:false) and the
interactive-CLI banner probes (cli.py) are intentionally NOT touched here —
the former needs a Windows-tested change of its own, the latter runs in a
visible console anyway.
Tracking: #54220
Refs: #53178#53631#53781#53957#49602#52982#53424#53053#53016
_append_model_switch_marker() appended the post-/model-switch context marker
to session history as {"role": "system"}. The cached system prompt is
prepended to the API message list (conversation_loop.py), so this marker
became a SECOND system message mid-array after prior user/assistant turns.
Strict OpenAI-compatible providers (vLLM, Qwen) reject any system message
that is not at the beginning of the array, returning HTTP 400 and killing
the conversation on the next turn.
Flip the marker to role="user" (history entry + both session-DB persist
sites), matching the existing personality-overlay marker which already uses
role="user". repair_message_sequence() then coalesces it with adjacent user
turns as needed.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Lucas Nicolas <lucas.nicolas@proton.me>
The desktop app and dashboard chat reach the agent through the /api/ws
JSON-RPC sidecar (tui_gateway.ws.handle_ws), NOT through
tui_gateway.entry.main() — the stdio-TUI path that spawns the background
MCP discovery thread. In the WS process discovery was therefore never
started: _make_agent only *waits* (wait_for_mcp_discovery), which no-ops
when the thread was never created, so the agent snapshotted an MCP-less
tool list. The only discovery trigger reachable was a manual /reload-mcp,
which is why tools appeared after a reload but vanished on restart.
Start the shared, idempotent, config-gated background discovery in
handle_ws right after accept() and before gateway.ready, so the first
agent build picks up already-spawning servers (and the existing
late-binding refresh handles slow ones).
Fixes#38945.
When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a
full traceback for every pending connection-lost callback — 50+ identical
WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the
equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not
actionable: they are the expected side effect of the peer hanging up before
our writes drained.
Install a loop exception handler on the gateway serving loop that collapses
exactly this teardown class (ConnectionResetError/ConnectionAbortedError/
BrokenPipeError originating from _call_connection_lost) to a single debug
line, forwarding every other loop error to the existing/default handler
unchanged so genuine loop bugs still surface. Idempotent per loop.
A WebUI/TUI session whose last turn died mid-tool-loop (stale-timeout kill,
interrupt, or process restart before the tool result was written) persists a
dangling assistant(tool_calls) or interrupted assistant->tool tail. The
messaging gateway already strips these tails before replay (the #49201 fix),
but the TUI/WebUI resume path fed db.get_messages_as_conversation() straight
in as the agent's conversation_history with no cleanup. The model re-issued
the unanswered call on every resume -- including after a full WebUI + Gateway
restart, since the poison lives in the SessionDB, not memory -- leaving the
session permanently 'thinking'. Only deleting the session recovered it.
- Extract the two strippers + helper from gateway/run.py into a shared
agent/replay_cleanup.py (sanitize_replay_history wraps both).
- gateway/run.py re-exports under the historical private names; messaging
behavior unchanged.
- Both TUI cold-resume sites now sanitize the model-fed history while leaving
the display transcript untouched, so the user still sees their full history.
Verified E2E against a real SessionDB: dangling and interrupted tails are
stripped from the model feed, healthy mid-progress tool sequences are
preserved, and the display transcript is always the full raw history.
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.
Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:
- Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
secrets -- stripped even for credential-inheriting children.
- Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
unless inherit_credentials=True. The opt-in is grep-able for audit.
Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.
This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.
Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
complete.path and complete.slash ran inline on the tui_gateway stdin
reader thread. complete.path spawns git ls-files and fuzzy-ranks the
whole repo; complete.slash does first-call prompt_toolkit imports plus a
skill-dir scan. While either ran, prompt.submit / session.interrupt sat
unread in the stdin pipe, freezing the TUI until the 120s RPC timeout
fired — most reliably reproduced by typing @ on a large repo / WSL2 mount.
Add both to _LONG_HANDLERS so completion runs on the existing thread
pool (write_json is already _stdout_lock-guarded). Root-cause fix:
covers any slow completion, not just the bare-@ trigger.
Fixes#21123
The MoA reference-block display (each reference model's output shown as a
labelled thinking block before the aggregator responds) previously existed
only in the classic CLI. The facade already emits moa.reference / moa.aggregating
through tool_progress_callback; this wires the TUI and desktop consumers.
- tui_gateway/server.py: _on_tool_progress relays moa.reference (label / text /
index / count) and moa.aggregating to the Ink/desktop client as their own
events.
- ui-tui: gatewayTypes adds the two event shapes; createGatewayEventHandler
routes them; turnController.recordMoaReference pushes a committed
thinking-style segment tagged with the source model. Shown regardless of
showReasoning — references ARE the mixture-of-agents process the user opted
into, not ordinary reasoning. moa.aggregating is a status-only transition
(no transcript entry).
- apps/desktop: use-message-stream appends each reference as a labelled
reasoning chunk via the existing reasoning disclosure; GatewayEventPayload
gains label/index/aggregator.
Tests: tui_gateway emit (3), Ink handler render + showReasoning-independence +
aggregating-no-segment (3). TUI typecheck/lint clean; desktop typecheck/lint
clean.
* fix(windows): stop terminal-window popups from background spawns
Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:
- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
subprocess allocate a fresh console when the launching process has
none (pythonw desktop backend / detached gateway) - even with output
captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
get a console regardless of how they're launched.
Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
exclusively owns an auto-allocated console detaches it at startup
(GetConsoleProcessList()==1 gate leaves shared interactive consoles
untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
tools repo-wide so the popup class can't regress.
* fix(windows): scope FreeConsole to background entry points; fix merge fallout
Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.
- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
(start_server, what the desktop spawns), cron standalone, tui_gateway entry,
slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
no-op without console / on POSIX, and asserts it's not an import side effect.
Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
every terminal init). _subprocess_compat.popen already hides the window, so
drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
case-variant test to drive that production path.
* test(claw): mock _subprocess_compat.run seam for Windows process scan
claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.
Also fixes#53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.
Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
When an MCP server requires OAuth, the interactive `hermes` TUI froze on
startup: background MCP discovery hit the OAuth flow, which on an interactive
TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the
"paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That
thread competes with the TUI's own stdin reader for the same terminal, so
keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth
timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL
... the tui is freezing, not respond to typing."
Add a thread-local `suppress_interactive_oauth()` context manager in
tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the
stdin paste-thread and prompt are never created. Background discovery
(hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside
that context, so OAuth-requiring servers soft-skip (raise
OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin.
A real `hermes mcp login` on the main thread is unaffected (thread-local).
Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick;
resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_
discovery refactor, keeping both). Verified E2E: with suppression the paste
prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive
soft-skip); without it the prompt shows (the freeze). Mutation-verified
(removing the suppress check in _is_interactive fails the regression test).
76 tests pass, ruff clean.
Closes#35927.
SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT
propagate to the dedicated mcp-event-loop thread where OAuth actually runs
(discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so
the suppression was a NO-OP in production (the tests passed only by stubbing
out the cross-thread dispatch). Converted to a contextvars.ContextVar, which
asyncio copies onto the scheduled coroutine — empirically verified suppression
now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path.
Added a cross-thread regression test (fails on threading.local, passes on the
ContextVar) so the no-op can't regress.
Ensure TUI/desktop stop targets the actual conversation thread and cancels any queued next prompt, including the lazy agent-start window, so a stopped session cannot keep running or restart itself.
When the primary provider raises AuthError (e.g. expired OAuth token),
_make_agent now walks the configured fallback_providers/fallback_model
chain before giving up — matching the behavior that cron/scheduler.py
and cli_agent_setup_mixin.py already have.
Fixes#47627
* feat(moa): expose MoA presets as selectable virtual models
Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.
Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.
- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
/ one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.
* fix(moa): thread moa_config through _run_agent to _run_agent_inner
The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).
* fix(moa): classify moa as a virtual provider in the catalog
The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
- test_provider_catalog: api_key providers must expose a credential env var
- test_provider_parity: every hermes-model provider must be desktop-configurable
moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
Collapse the duplicated cold-resume / lazy-watch / create scaffolding into
shared helpers: _deferred_session_record (the live-session dict minus the
agent), _lazy_resume_info (the not-yet-built session.info), _claim_or_reuse_live
(lock + double-checked register-or-reuse), and _schedule_agent_build (the
pre-warm timer). Net -12 lines, three copies of the ~30-key session dict and
the lazy-info block down to one each. No behavior change.
Per review: gating the faster path behind a `defer_build` flag that the
only caller always sends is pointless. Flip it — `session.resume` now
defers the agent build by default for every caller (desktop + Ink TUI);
a caller that needs the agent built synchronously passes `eager_build:
true` (used by the build-race test). The desktop no longer sends a flag.
While verifying the flip, fixed two real parity gaps the deferred path
had vs the old eager (`_init_session`) path:
- `_enable_gateway_prompts()` was never called on a deferred resume, so
approvals/clarify wouldn't route through the gateway prompt callbacks.
- `_start_agent_build` never wired `background_review_callback` /
`memory_notifications`, so a deferred-built session's self-improvement
"💾 …" summary leaked to stdout instead of rendering in-transcript.
Wiring it there also fixes it for `session.create` sessions, which
build through the same path.
ACP is unaffected (it uses its own session_manager, not this RPC); the
Ink TUI already consumes the same lazy `info` shape from session.create
and upgrades on the later `session.info` event.
Switching sessions in the desktop app could freeze the whole UI for
several seconds on heavy, tool-rich chats. Root causes and fixes:
- Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill
build) *before* returning, and the desktop awaits that RPC before it
paints — so the entire switch blocked on the build. Add an opt-in
`defer_build` resume path (the contract `session.create` already uses):
return the full display transcript immediately, register an upgradable
live session, and pre-warm the agent on a short timer. The persisted
runtime identity (model/provider/base_url/api_mode/reasoning/tier) is
restored on the deferred build so it can't drop the provider.
- Nothing bounded how many in-memory agents accumulate; a user who
reconnects often piled up detached sessions for the full 6h TTL. Add a
soft LRU cap (`max_live_sessions`, default 16) that evicts the
least-recently-active DETACHED sessions (no live client) — never a
running, awaiting-input, mid-build, or live-transport one. Reopening
re-resumes from disk.
- On the prefetch-hit cold-resume path, skip rebuilding a throwaway
merged-message array (and its 1000-entry Map) when the prefetch already
painted the exact transcript; the downstream sameMessageList guard
already drops the publish, so it was pure main-thread cost.
The desktop opts into `defer_build` for every non-watch cold resume; the
eager path stays for CLI/TUI and existing callers.
atomic_yaml_write used default yaml.dump which emits indentless
sequences (list items at column 0), while atomic_roundtrip_yaml_update
(ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to
the same config.yaml toggled indentation on every save, eventually
producing a mixed-indent file that js-yaml rejects with 'bad indentation
of a mapping entry', silently dropping custom_providers and breaking
model switching.
Add IndentDumper SafeDumper subclass that forces indentless=False,
route atomic_yaml_write through it. Route tui_gateway._save_cfg and
the Telegram adapter's config writer through atomic_yaml_write so all
paths emit the same 2-indent layout.
Salvaged from #32034 by @xxxigm. Adapted to current main which already
has allow_unicode=True (from #51356) but was missing IndentDumper.
Closes#31999
A prompt sent while a turn was in flight got rejected with 4009 "session busy",
which pushed clients (the desktop app) into a deadline-bounded busy-retry. When
turn teardown outlived that deadline — e.g. the user hits stop while a slow,
non-interruptible tool (web_search, read_file, an MCP call) is mid-flight, since
the sequential executor only checks the interrupt flag between tools — the
resubmitted message was silently dropped: "it just doesn't listen".
Wire the previously-dead display.busy_input_mode config into prompt.submit:
instead of rejecting, apply the policy and queue the message to run as the next
turn (drained in run()'s tail, ahead of goal/notification follow-ups). Modes:
interrupt (default) interrupts the live turn so it winds down promptly then runs
the queued message; queue runs it after the current turn finishes; steer injects
it into the live turn when accepted, else queues. The queued slot pins the
sender's transport and losslessly merges a second arrival. No client deadline,
no dropped sends.
The Desktop GUI (tui_gateway) slash worker subprocess has no reader for
the CLI's _pending_input queue. /learn's CLI handler prints the ack and
puts the built prompt onto that queue, so in the TUI the prompt was
silently dropped — ack shown, no LLM turn, no skill created (#51829).
command.dispatch already handles 'learn' correctly (returns
{type: send, message: build_learn_prompt(arg)}), but 'learn' was missing
from _PENDING_INPUT_COMMANDS, so slash.exec fell through to the worker
instead of routing to command.dispatch. Add it to the frozenset, matching
the existing goal/queue/steer/plan pattern.
Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.
When no reference-capable image backend is configured, generating a pet is
impossible — so instead of a dead prompt + post-hoc error, the overlay now
detects it up front and offers a way out:
- pet.generate.status RPC reports whether a reference-capable provider
(OpenRouter / Nous Portal / OpenAI) is set up; the overlay probes it on
open and swaps the prompt for a friendly setup card (paw, one-line copy,
"Set up image generation" → /settings?tab=providers, key links).
- useRouteOverlayActive(): reusable hook so any portaled modal yields the
screen to a full-screen route overlay (e.g. settings) and reappears —
re-running its mount effects — on return, instead of closing. The probe
re-runs on that remount, so adding a key flips the card to the prompt.
- pet.generate / pet.hatch (parallel rows, off the reader thread) +
cooperative pet.cancel; pet.export / pet.rename.
- pet.gallery localOnly fast path + background manifest prefetch so the
picker never blocks on petdex; rename follows the active-pet config.
- gateway request gains optional timeout + AbortSignal for real Stop.
Let setup.runtime_check accept an optional provider, persist the selected
provider/model before the gate, and validate the provider the user just
connected instead of a stale config entry such as anthropic.
The TUI model-switch persistence (_persist_model_switch) rewrote the entire
model config block via save_config(), destroying sibling keys the user set
under model: (model_slots, model_fallback, base_url, ...) on every switch.
Use targeted, atomic, comment-preserving save_config_value("model.default" /
"model.provider" / "model.base_url") writes instead, so a model switch only
touches the keys it changes.
Salvaged from #48391 by kyssta-exe (authorship preserved).
Fixes#48305