Commit graph

160 commits

Author SHA1 Message Date
Ray
6a58badfdc fix(browser): guard Camofox eval private pages
Extends the browser private-network eval guard to the Camofox backend.
On main, _browser_eval() returned early in Camofox mode before running the
shared private-URL literal pre-scan and before re-checking the page URL
after eval, leaving Camofox as a sibling backend that could execute
browser_console(expression=...) against private/internal targets.

- move the eval private-URL literal pre-scan before the Camofox early return
- add a Camofox current-page private-URL probe via the evaluate endpoint
- withhold Camofox eval results when the page is now private/internal

Follow-up to browser private-network hardening in #56173, #56526, #56664.

Salvage of #56764 by @rayjun (rayoo), cherry-picked to preserve authorship.
2026-07-02 13:10:30 +05:30
srojk34
4612ee9464 security(browser): re-check private-network guard after browser_back navigation
Every other content-returning browser tool entry point
(browser_snapshot/vision/console/eval, and click/type/press via
_blocked_private_page_action) re-checks window.location.href against the
private/internal/cloud-metadata floor after the page could have changed --
because a redirect chain or client-side navigation can land on an address
the initial browser_navigate preflight never saw. browser_back was the one
navigation-triggering entry point missing this: it called
_run_browser_command(..., "back", []) and returned the resulting URL
straight to the model with no re-check.

On a cloud/CDP (non-local) backend, if browser history contains a
private/internal address (e.g. a prior redirect touched an internal host),
browser_back would navigate the live browser there and hand the URL back
to the model with no guard -- the exact class of gap the private-page
guard exists to close, just on the one entry point it hadn't reached yet.

Re-check happens after the navigation succeeds (not before, unlike
click/type/press) since it's the resulting page -- not the one being left
-- whose safety matters. A failed back navigation (no history) skips the
check entirely since nothing changed. Verified live: the new regression
test fails (returns the private URL instead of a blocked payload) on the
pre-fix code and passes after.
2026-07-01 20:01:55 +03:00
dsad
a4af257a6d fix(browser): extend private-network guard to browser_console 2026-07-01 05:23:17 -07:00
claudlos
0a75616514 security(browser): enforce cloud-metadata floor on all backends; CDP is non-local
browser_navigate's always-blocked cloud-metadata floor (169.254.169.254,
metadata.google.internal, ECS/Azure/GCP IMDS) was gated on
`not _is_local_backend()`, contradicting both the adjacent comment and the
is_always_blocked_url docstring ("denied regardless of backend"). A default
local headless Chromium on a cloud VM — or an off-host CDP browser — could
navigate to IMDS and read instance credentials into the model context. Make the
floor unconditional on the initial-nav and post-redirect paths.

Also: _is_local_backend() ignored a CDP override while _is_local_mode() honors
it, so an off-host CDP browser was treated as "local" and skipped the broader
private/internal SSRF check too. Treat a CDP override as non-local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 05:09:35 -07:00
yongyong
937e56be92 fix(browser): block bracketed sensitive eval primitives 2026-07-01 05:04:41 -07:00
yongjin
a0beb52a50 fix(browser): harden browser tool safety boundaries
Add policy gates and output redaction for browser/CDP surfaces, strengthen session ownership tracking, and block credential-like query parameters before third-party browser/web backends receive URLs.

Inspired by the agbrowse review: keep local browser magic-link flows possible while preventing cloud reader/browser escalation from receiving opaque token, code, signature, or key query parameters.
2026-07-01 05:04:41 -07:00
kshitijk4poor
83ae65487e test(browser): cover guard-inactive + camofox short-circuit paths; fix blank lines
Review follow-up on the private-page action guard:
- Add test_guard_inactive_does_not_block_or_probe: when the SSRF guard is
  inactive (local backend / allow_private_urls), click/type/press must proceed
  WITHOUT probing the page URL. This is the branch most likely to silently
  regress if the guard condition is inverted; a mutation check (flipping the
  condition) confirms the test fails as designed.
- Add test_camofox_short_circuits_before_guard: camofox mode returns from the
  dedicated camofox_* path before the guard runs; guards never consulted.
- Fix PEP8: 3 -> 2 blank lines before _blocked_private_page_action.
2026-07-01 13:56:49 +05:30
dsad
3e4c138251 fix(browser): block private-page interactions after eval navigation 2026-07-01 13:56:49 +05:30
kshitijk4poor
c626dded13 refactor(redact): consolidate CDP-URL log redaction into one chokepoint
The session-log fix (browser_tool._sanitize_url_for_logs) and the
supervisor attach-timeout fix (CDPSupervisor.start) both composed the
same three redactors (redact_sensitive_text -> _redact_url_query_params
-> _redact_url_userinfo) to mask CDP endpoint credentials. Two copies of
one policy drift: tune one site (e.g. add fragment masking) and the other
silently re-leaks.

Promote that composition to a single public helper redact_cdp_url() in
agent/redact.py -- the one place the CDP-URL redaction policy lives -- and
route both call sites through it (_sanitize_url_for_logs becomes a thin
wrapper; the supervisor imports the helper instead of re-composing the
private redactors). Add direct unit tests for the seam covering query
tokens, multiple credentials, userinfo passwords, plain-URL passthrough,
non-string/exception coercion, and None.

No behavior change at the call sites; both leak paths remain closed.
2026-07-01 13:43:58 +05:30
srojk34
265da9cadb fix(browser): redact CDP URL token in _create_cdp_session log and supervisor timeout
PR #54851 added _sanitize_url_for_logs() and wired it into the three log
sites inside _resolve_cdp_override(). A fourth site was missed:
_create_cdp_session() logs the already-resolved cdp_url unconditionally,
and CDPSupervisor.start() interpolates the raw cdp_url[:80] into the
attach-timeout TimeoutError (which _ensure_cdp_supervisor() logs with %s).
Both leak query-string credentials (e.g. ?token=secret from hosted CDP
providers) into Hermes logs.

Sanitize the URL at both remaining sites. The raw URL is preserved
unmodified in the returned session dict and used for the real connection;
only the logged/error representation is redacted.

Salvaged from #55883.

Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>
2026-07-01 13:43:58 +05:30
Ruzzgar
576424cc1c fix(security): redact browser CDP endpoint logs 2026-06-29 04:25:26 -07:00
teknium1
9f97915163 fix(browser): route open-timeout base through _safe_command_timeout
Wire the salvaged _safe_command_timeout() guard into the surviving
open-timeout call site. _get_open_command_timeout() feeds the
browser_navigate 'open' path; this closes the last call site that
could observe a None timeout from a torn cache (#14331), since the
original PR's max(_get_command_timeout(), 60) site no longer exists
on main (now routed through _get_open_command_timeout).
2026-06-29 02:24:57 -07:00
Sanjay Santhanam
c79e6bceae fix(browser_tool): resolve race in _get_command_timeout cache returning None (#14331)
# Conflicts:
#	tools/browser_tool.py
2026-06-29 02:24:57 -07:00
srojk34
61210097a5 fix(browser): extend private-network guard to browser_get_images
The SSRF cluster (7a6fe9bb, 48f5c425, 7ef04ae7) sealed
browser_snapshot, browser_vision, and _browser_eval against
eval-navigated private pages, but browser_get_images bypasses
_browser_eval and calls _run_browser_command("eval", ...) directly.
An eval-driven navigation to a private address followed by
browser_get_images would leak image src URLs and alt text from the
private page.

Add the same _eval_ssrf_guard_active + _current_page_private_url
recheck before returning image data, matching the pattern established
by the sibling guards.

5 new tests cover: block on private page, allow on public page, skip
for local backend, skip when private URLs allowed, no guard needed on
failed eval.
2026-06-28 14:25:10 -07:00
Teknium
9a0010fd46
fix(windows): cover remaining console-flash spawn legs (#54417) 2026-06-28 13:49:08 -07:00
Brooklyn Nicholson
70292596ef feat(browser): auto-install Chromium binary on local cold-start failure
When a local browser_navigate (or any browser command) fails fast because
Chromium isn't on disk, attempt a one-shot binary download via
`agent-browser install` and retry instead of only printing a hint.

Scope is narrow on purpose:
- binary only, never `--with-deps` (that shells apt/needs root, so missing
  system libraries stay a user action)
- gated by `security.allow_lazy_installs` (same opt-out as every lazy install)
- skipped in Docker (Chromium ships in the image)
- attempted once per process

Follow-up to #54353, which made the cold-start failure legible; this closes
the "doesn't actually install the missing browser" gap for the common case.
2026-06-28 12:25:15 -05:00
Brooklyn Nicholson
1ab5c3cdda refactor(browser): drop redundant sandbox-hint substring check 2026-06-28 12:14:47 -05:00
infinitycrew39
a10727a555 fix(browser): extend first-open timeout and surface daemon errors
Local browser_navigate cold-starts the agent-browser daemon and Chromium;
60s was too short on slow Linux hosts and timeouts discarded stderr,
leaving users with a generic failure. Use a 120s floor on first open,
inject --no-sandbox in Docker, include captured daemon output plus install
hints when commands time out, and show "Failed to open" in the desktop
tool chip when navigation returns success=false.
2026-06-28 12:14:21 -05:00
teknium1
7ef04ae7a7 fix(browser): close eval return-value SSRF bypass (sibling of #44731)
The snapshot/vision guards re-check the page URL before returning content,
but browser_console(expression=...) -> _browser_eval returns arbitrary JS
results directly, leaving two same-class bypasses open:

  1. Direct fetch: fetch('http://127.0.0.1/secret').then(r=>r.text()) reads
     a private endpoint and returns the body — the page URL stays public so
     the post-eval recheck never sees it.
  2. Navigate-then-read: location.href='http://127.0.0.1/' then a later eval
     reads document.body.innerText.

Guard _browser_eval on the same condition as navigate/snapshot/vision
(not local backend, not local sidecar, not allow_private_urls):
  - pre-scan the expression for private/always-blocked URL literals
  - re-check window.location.href after the eval at both success-return
    sites (supervisor fast-path + subprocess fallback)

Probe failures fail-open (matching the snapshot/vision guards).
2026-06-28 02:42:01 -07:00
liuhao1024
0ae6196087 fix(browser): allow local sidecar sessions to bypass SSRF guard
The private-network guard in browser_snapshot() and browser_vision()
blocked all private URLs, including those accessed via local sidecar
sessions (hybrid routing). Local sidecar sessions intentionally access
private URLs — the cloud provider never sees the URL in that case.

Add `_is_local_sidecar_key(effective_task_id)` check to both guards,
matching the existing pattern in browser_navigate().

Fixes #45101 review feedback from egilewski.
2026-06-28 02:42:01 -07:00
liuhao1024
48f5c42599 fix(browser): extend private-network guard to browser_vision
The SSRF bypass in #44731 was only patched for browser_snapshot(), but
browser_vision() exposes the same vulnerability — it takes a screenshot
and sends it to the vision model without checking if eval-driven
navigation moved the page to a private/internal URL.

Add the same current-page URL safety check to browser_vision() before
any screenshot is captured, encoded, or forwarded to the vision model.
This covers both the normal screenshot path and the Lightpanda Chrome
fallback path.

7 new tests: blocks private URL, allows public URL, skips in local
backend, skips when private URLs allowed, handles eval failure/empty/exception.
2026-06-28 02:42:01 -07:00
liuhao1024
7a6fe9bbfa fix(browser): block snapshot from eval-navigated private pages
browser_snapshot() now checks the current page URL before returning
content. When browser_console() changes location.href to a private or
internal address (e.g., http://127.0.0.1:8080/), the snapshot returns
an error instead of exposing the private page content.

This closes the SSRF bypass where an attacker could:
1. Navigate to a public page
2. Use browser_console to eval location.href = 'http://127.0.0.1:port/'
3. Use browser_snapshot to read the private page content

The fix reuses the existing _is_safe_url() and _allow_private_urls()
infrastructure, and fails open if the URL check itself fails.

Fixes #44731
2026-06-28 02:42:01 -07:00
teknium1
8c8967a50b fix: defer hermes_subprocess_env import in browser_tool
The module-level import broke tests/tools/test_managed_browserbase_and_modal.py,
which loads browser_tool.py via spec_from_file_location against a stubbed
'tools' package that does not include tools.environments.local. Move the import
into a _build_browser_env() helper called at the two agent-browser spawn sites,
matching the lazy-import pattern already used by lazy_deps.py.
2026-06-27 20:45:31 -07:00
teknium1
9c6229ce24 fix(security): centralize credential-safe subprocess env (#29157)
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.

Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:

  - Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
    secrets -- stripped even for credential-inheriting children.
  - Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
    unless inherit_credentials=True. The opt-in is grep-able for audit.

Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.

This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.

Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).

Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
2026-06-27 20:45:31 -07:00
Teknium
d3d621f7c3
revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853)
* Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)"

This reverts commit 2ecca1e7d3.

* Revert "fix(windows): stop terminal-window popups from background spawns (#53810)"

This reverts commit 5db1430af9.

* Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)"

This reverts commit ef17cd204d.
2026-06-27 15:59:00 -07:00
brooklyn!
5db1430af9
fix(windows): stop terminal-window popups from background spawns (#53810)
* fix(windows): stop terminal-window popups from background spawns

Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:

- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
  agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
  subprocess allocate a fresh console when the launching process has
  none (pythonw desktop backend / detached gateway) - even with output
  captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
  get a console regardless of how they're launched.

Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
  CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
  console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
  exclusively owns an auto-allocated console detaches it at startup
  (GetConsoleProcessList()==1 gate leaves shared interactive consoles
  untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
  tools repo-wide so the popup class can't regress.

* fix(windows): scope FreeConsole to background entry points; fix merge fallout

Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.

- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
  remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
  (start_server, what the desktop spawns), cron standalone, tui_gateway entry,
  slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
  no-op without console / on POSIX, and asserts it's not an import side effect.

Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
  every terminal init). _subprocess_compat.popen already hides the window, so
  drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
  primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
  case-variant test to drive that production path.

* test(claw): mock _subprocess_compat.run seam for Windows process scan

claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
2026-06-27 14:02:24 -07:00
teknium1
6c58878e7d fix(browser): force secret-pattern redaction on browser_type display
Force redact_sensitive_text(force=True) on the browser_type text arg so
recognized credentials (API keys, tokens, JWTs) are masked in tool
progress, previews, callbacks, and return payloads even when the global
security.redact_secrets opt-out is set — a typed credential reaching chat
history is a security boundary, not log hygiene. Normal typed text matches
no pattern and stays fully readable for debuggability.

Tests assert the API-key-shaped secret is masked across every surface and
that normal text passes through unchanged.
2026-06-25 22:02:22 -07:00
rebel
8ff426e53b fix: redact browser typed text surfaces 2026-06-25 22:02:22 -07:00
Gille
e7d2f0b93c fix(windows): suppress console flashes and harden gateway restarts 2026-06-25 14:42:38 -07:00
Teknium
3c75e11571
fix(browser): validate agent-browser is runnable, not just present (#51740)
After `hermes update`, a globally-installed agent-browser's npm postinstall
(fixUnixSymlink) re-points the global symlink (e.g. /opt/homebrew/bin/agent-browser)
at our local node_modules binary. The next update wipes node_modules, leaving a
dangling symlink that `which` still reports but exec fails on with exit 127 —
silently breaking every browser tool (#48521).

Root cause is trust-on-presence: shutil.which/Path.exists accept a name that
resolves but won't run. Add hermes_constants.agent_browser_runnable() (resolves
the path + runs --version) and gate all four resolution sites on it:
_find_agent_browser now skips a dead candidate and falls through to the next
working one (extended PATH -> local .bin -> npx), self-healing the dangling link.
dep_ensure/doctor/nous_subscription validate too; doctor warns on a broken link.

Closes #48521.
2026-06-24 00:14:49 -07:00
teknium1
624580e836 fix(browser): verify daemon identity before orphan reaper kills a PID (#14073)
The browser orphan reaper reads a daemon PID from a `.pid` file in a
world-writable, predictably-named temp dir (`/tmp/agent-browser-h_*`) it
does not write itself, then tree-kills that PID via `_terminate_host_pid`
after only a liveness check. A same-user actor could plant a fake socket
dir whose `.pid` points at an arbitrary victim process, and OS PID reuse
after the real daemon exits could land the recorded PID on an unrelated
process — either way an arbitrary same-user process (and its whole tree)
gets SIGTERMed. Local DoS.

Add `_verify_reapable_browser_daemon()`, gated before the kill: via psutil
(a hard dep, fine cross-platform for the same-user processes the reaper can
signal) require both (1) identity — `agent-browser` in the process
name/cmdline — and (2) binding — the live process references *this* session's
socket dir in its cmdline or `AGENT_BROWSER_SOCKET_DIR`. The binding check is
the real spoof defense: a planted/recycled PID won't embed our exact socket
path. Fail-closed on any ambiguity (unreadable cmdline, no match), leaving the
process and its socket dir untouched for a later sweep.

Builds on @sgaofen's fix in #14394 (cmdline identity check); rewritten to use
psutil instead of `/proc`+`ps` (cross-platform, Windows-covered) and to add
the session-socket-dir binding check for recycled-PID / spoof resistance.

Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
2026-06-21 15:23:47 -07:00
liuhao1024
6984026f12 fix(browser): enable SSRF guard when terminal runs in container
When terminal.backend is docker/modal/daytona/ssh/singularity, the
terminal runs in a sandboxed container with network isolation, but the
browser still runs on the host.  The SSRF guard was skipped because
_is_local_backend() only checked browser.cloud_provider, not the
terminal backend.

Now _is_local_backend() also checks TERMINAL_ENV — when the terminal
is containerized, the browser is treated as non-local and SSRF
protection is enabled.

Fixes #38690
2026-06-21 07:26:18 -07:00
Teknium
3e7e9b24d4 fix: harden salvaged session and browser improvements
Polish salvaged contributor work before PR review:
- read browser inactivity timeout from config with documented fallback
- skip redundant v10 trigram backfill before v11 FTS rebuild
- show delegate_task goals safely in progress previews
- show gateway status model/context without redundant token wording
- wire gateway /sessions to shared session-listing helpers
- map Ravenwolf author emails for release attribution

Co-authored-by: Wolfram Ravenwolf <github.com@wolfram.ravenwolf.de>
Co-authored-by: Amy Ravenwolf <amy@ravenwolf.de>
2026-06-15 07:46:34 -07:00
Amy Ravenwolf
2f2e3616b4 fix(config): read browser inactivity timeout from config 2026-06-15 07:46:34 -07:00
helix4u
333f01bc7f fix(tools): percent-encode non-ascii URL components 2026-06-07 11:42:26 -06:00
Teknium
2912d94370
fix: guard int(os.getenv()) casts against malformed env vars (#40598)
A non-numeric value in env vars like HERMES_STREAM_RETRIES,
HERMES_KANBAN_SPECIFY_MAX_TOKENS, GOOGLE_CHAT_MAX_BYTES, IRC_PORT, etc.
raised ValueError at import/init and crashed startup. Parse them safely,
falling back to the default.

Unified onto the existing utils.env_int(key, default) helper for core/
hermes_cli/tools modules instead of the original PR's three duplicate
local helpers; plugins keep minimal inline guards (no core-utils import).
All existing max()/min()/`or extra.get()` wrappers preserved.

Co-authored-by: annguyenNous <annguyenNous@users.noreply.github.com>
2026-06-07 06:14:24 -07:00
Teknium
92ad7cc62c
fix(browser): recover from CDP DOM-node serialization crash in browser_console (#35385)
browser_console(expression="document.body") returned the cryptic CDP error
"Object reference chain is too long" instead of a usable result.

With returnByValue=true, Chrome deep-serializes the eval result; for a live
DOM Node/NodeList/Window that serialization overruns CDP's recursion guard
and fails the whole call with a protocol-level error (not a JS exception),
which _browser_eval surfaced raw.

- browser_supervisor.evaluate_runtime: on that specific error, retry once
  with returnByValue=false so Chrome returns the node's description string —
  the same graceful path already used for document.querySelector() results.
- browser_tool._browser_eval (CLI subprocess fallback): the subprocess can't
  retry, so convert the reference-chain error into actionable guidance
  (extract a primitive / use JSON.stringify) instead of leaking it raw.

No expression rewriting — normal evals (1+41 -> 42) are untouched.
2026-05-30 07:31:25 -07:00
kurobaryo
03bdeaa876 docs: fix BROWSERBASE_SESSION_TIMEOUT unit (ms → seconds) 2026-05-29 12:06:22 -07:00
teknium1
c77a697fa4 refactor(vision): consolidate native fast-path gate into one shared helper
The fast-path decision (native routing + provider allowlist OR
supports_vision override) lived inline in vision_analyze and was copied
into browser_vision. Extract it to _should_use_native_vision_fast_path()
so both tools share one source of truth.

- vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines
- browser_tool: thin envelope decoration over the shared helper, not a copy
- browser_vision typed Union[str, Dict] to match its real return shape
- tests slimmed to target the override path + text-mode-wins invariant
2026-05-29 03:58:56 -07:00
tillfalko
c3f28c651d docs(browser): update browser_vision tool description for native vision routing 2026-05-29 03:58:56 -07:00
tillfalko
f8b8dffccf fix(browser): add native image support to browser_vision and respect supports_vision 2026-05-29 03:58:56 -07:00
kshitijk4poor
66827f8947 chore: prune unused imports and duplicate import redefinitions
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.

- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
  public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
  unused in their defining module are kept with explicit # noqa:
  F401 (gateway/run.py load_dotenv; run_agent re-exports from
  agent.message_sanitization, agent.context_compressor,
  agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
  agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
  can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
  selected); this is a one-time cleanup, not a config change

Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
  toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
  module still resolves
2026-05-28 22:26:25 -07:00
Teknium
3d66787a04
fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452)
* fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision

Fixes #31179. Three coupled fixes so a configured aux vision backend
actually serves vision tasks instead of silently routing images to the
user's main provider:

1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves
   to `custom` + `https://api.openai.com/v1`. "openai" was not in
   PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for
   manual base_url), so the obvious config name silently failed to build a
   client. User-supplied base_url is still preserved; only the provider
   name normalises to `custom` so resolution doesn't hit the
   PROVIDER_REGISTRY-only path.

2. agent/auxiliary_client.py: the vision auto-detect chain now skips the
   user's main provider when models.dev reports `supports_vision=False`.
   Without this guard, a misconfigured aux provider would fall back to
   `auto`, which happily returned the main-provider client. The caller
   would then send image content to e.g. api.deepseek.com with model
   `gpt-4o-mini` and get a cryptic `unknown variant 'image_url',
   expected 'text'` from the provider's parser.

3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements`
   now mirrors the runtime fallback chain (explicit provider, then auto),
   so `vision_analyze` shows up whenever vision is actually serviceable.
   `browser_vision` gets a new `check_browser_vision_requirements` check_fn
   that AND-gates browser + vision availability, so it doesn't get
   advertised to the model when the call would fail at runtime.

Reproduction (config from the bug report):
  model.provider: deepseek
  model.default: deepseek-v4-pro
  auxiliary.vision.provider: openai
  auxiliary.vision.model: gpt-4o-mini

Before: resolve_vision_provider_client() returns None for the explicit
provider, fallback auto returns the deepseek client with model='gpt-4o-mini',
image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze
hidden from tool list; browser_vision exposed but fails at call time.

After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini.
vision_analyze and browser_vision both gate correctly on capability.

Tests: tests/agent/test_vision_routing_31179.py covers all three fixes
(12 cases including the user's exact scenario, base_url preservation,
text-only-main skip, capability-unknown permissive fallback, and tool
gating parity). Existing 382 tests across auxiliary/vision/image_routing
suites still pass.

* test(vision): use exact hostname check to silence CodeQL substring-sanitization alert

* fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL

The new `logger.debug(...)` added in the previous commit interpolated
both `main_provider` and `vision_model` (a public model slug \u2014 not
sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic
re-flagged it twice because the rule mis-detects multi-value
interpolations near tainted-via-config provider strings.

Drop the model from the log args (provider alone is enough to diagnose
the skip; the same sibling branch a few lines up already logs provider
only). Behavior unchanged; CodeQL false positive cleared.
2026-05-24 15:01:28 -07:00
Yuan Li
22f3f5a75a fix(browser): use process-tree termination for daemon cleanup
os.kill(pid, SIGTERM) only signals the parent, leaving Chromium child
    processes (renderer, GPU, etc.) orphaned.  Reuse the existing
    ProcessRegistry._terminate_host_pid() helper which walks the process
    tree leaf-up via psutil, terminating children before the parent.
2026-05-23 20:30:29 -07:00
Siddharth Balyan
e3a254d65b
feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection (#27845)
* feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection

dep_ensure.py gains Windows awareness: PowerShell invocation, platform-
specific browser detection, (path, shell) tuple returns.

install.ps1 gains -Ensure/-PostInstall modes using npm -g --prefix
(aligned with install.sh) and agent-browser install for Chromium.

browser_tool.py gains node/ in candidate dirs for Windows .cmd shims.
Both install scripts bundled in pip wheel.

Tracking: #27826

* fix(install.ps1): add --ignore-scripts to npm install for camofox

@askjo/camofox-browser has a dependency (impit) whose postinstall
script runs `npx only-allow pnpm`, which fails under npm. Adding
--ignore-scripts avoids the spurious failure without affecting
functionality.

Tracking: #27826

* fix: remove duplicate install scripts from git

CI already copies scripts/install.{sh,ps1} into hermes_cli/scripts/
during wheel build. No need to commit copies — .gitignore keeps them
out, _find_install_script() falls back to scripts/ for git-clone users.

Tracking: #27826

* fix: address review — remove env_extra, fix ps1 error handling

- Remove unused env_extra parameter from ensure_dependency()
- Invoke-EnsureMode node case now uses Test-Node consistently
- Install-AgentBrowser uses throw instead of exit 1
2026-05-18 16:34:24 +05:30
kshitijk4poor
c74ff2c8ef fix(browser): self-review pass — dead-import, log levels, future-proofing
Addresses findings from two self-review passes pre-merge.

First pass (3-agent parallel review):

1. plugins/browser/browser_use/provider.py: drop the
   ``_ = managed_nous_tools_enabled`` dead-import-hider in
   _get_config_or_none(). The import was actively misleading — the
   helper IS used in _get_config() (separate method, separate import),
   not here. The "keep static analysis happy" comment was wrong about
   what the helper does in this scope.

2. agent/browser_provider.py: drop ``pragma: no cover`` from
   is_configured() / provider_name() backward-compat aliases. They ARE
   covered by ``TestLegacyAbcAliases`` — the pragma would have masked
   future regressions.

3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden()
   to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot
   instead of hardcoded set of 3 keys. Future maintainers adding a 4th
   built-in provider now just extend _PROVIDER_REGISTRY; the override
   detection adapts automatically. Previously the hardcoded
   ``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip
   True forever on any 4-key registry, silently routing every install
   onto the legacy fixture path.

4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set
   but the registry has no matching plugin (typo, uninstalled plugin,
   discovery failure), emit a WARNING with actionable text instead of
   silently falling through to auto-detect. Legacy code surfaced a typed
   credentials error via direct class instantiation; this log restores
   the signal in the post-migration path.

5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE
   documentation. Module docstring + 13-line block-comment + 5-line
   inline comment was repeating the same point. Kept the docstring and
   trimmed the block-comment to 5 lines.

6. agent/browser_registry.py: upgrade is_available()-raised logging from
   DEBUG to WARNING with exc_info=True. A provider's availability check
   throwing is unusual enough that users debugging "no cloud provider"
   need the traceback in logs.

7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level
   imports (os, shutil, tempfile — only referenced inside the
   SUBPROCESS_SCRIPT string literal that runs in a child process).

Second pass (architecture + claim-verification review):

8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider
   auto-detect branch. Prior text claimed it "routes through the plugin
   registry's legacy preference walk so third-party plugins still get a
   chance to be selected when they're explicitly configured" — false on
   both counts. The branch uses module-level legacy class aliases
   (BrowserUseProvider / BrowserbaseProvider) directly; third-party
   plugins are intentionally reachable only via explicit
   ``browser.cloud_provider``. Corrected comment now matches behaviour
   and cross-references _LEGACY_PREFERENCE for the firecrawl gate
   rationale.

9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py:
   drop the unused ``get_active_browser_provider as
   _registry_get_active_browser_provider`` alias from the
   ``from agent.browser_registry import ...`` block. It was never
   referenced; matching test-stub line in the agent.browser_registry
   SimpleNamespace also dropped. ``get_provider`` is still imported (used
   by the explicit-config dispatch path at line 535).

10. plugins/browser/firecrawl/provider.py: align emergency_cleanup()
    with the early-guard pattern used in browserbase + browser_use
    plugins. Previously firecrawl tried the DELETE and relied on
    ``_headers()`` raising ValueError to trip a "missing credentials"
    warning; same final outcome but a different control flow that read
    like a bug to a maintainer skimming the three modules. Now: if
    is_available() is False, log+return early — identical shape to the
    other two providers.

Verification: 54/54 unit tests + 13/13 parity scenarios still pass.
2026-05-17 04:04:15 -07:00
kshitijk4poor
1bb6f03724 fix(browser): ensure plugin discovery before registry lookup; parity harness
Two changes that go together:

1. tools/browser_tool.py — add _ensure_browser_plugins_loaded() and call
   it from _get_cloud_provider() before consulting the registry. Normally
   model_tools triggers discover_plugins() as an import side-effect, but
   _get_cloud_provider() can be reached from contexts that haven't gone
   through model_tools (standalone scripts, certain unit-test paths, the
   new parity-sweep harness). Without the defensive call, the registry is
   empty and _registry_get_browser_provider() returns None — silently
   downgrading users to local mode when they explicitly configured a
   cloud provider with no credentials yet. The behavior-parity sweep
   below caught this as 4 scenario regressions (explicit-X-no-creds for
   all 3 providers, and explicit-firecrawl-with-creds).

2. tests/plugins/browser/check_parity_vs_main.py — subprocess harness
   that pins one Python invocation to origin/main and one to this PR's
   worktree via sys.path.insert(), runs _get_cloud_provider() across a
   13-scenario config matrix, and diffs the reduced shape tuple
   (is_local, provider_name, is_available). Provider_name pulls from
   provider.provider_name() which is the legacy CloudBrowserProvider
   API and remains as a backward-compat alias on the new BrowserProvider
   ABC, so the comparison is apples-to-apples regardless of class
   identity.

Final result: PARITY OK across 13 scenarios. The four observable
config/credential matrices that exercise the dispatcher all match
origin/main bit-for-bit:

  - no-config + no-env → local
  - explicit local + any env → local
  - explicit BB / BU / FC + no creds → provider returned with
    is_available()==False (so dispatcher surfaces typed credentials
    error; matches main exactly)
  - explicit BB / BU / FC + creds → provider returned with
    is_available()==True
  - no-config + BU creds → Browser Use
  - no-config + BB creds → Browserbase
  - no-config + both → Browser Use (legacy walk first hit)
  - no-config + FC only → local (firecrawl NOT in legacy walk)
  - no-config + FC + BB → Browserbase (legacy walk skips firecrawl)

Per the dev skill's "behavior-parity for refactor PRs" rule — without
this subprocess sweep, 31/31 unit tests pass while the production code
path is silently broken for users who type `browser.cloud_provider:
browserbase` and run a single browser command without prior model_tools
import. Caught + fixed before push.
2026-05-17 04:04:15 -07:00
kshitijk4poor
40fde853fa refactor(browser): dispatch _get_cloud_provider through agent.browser_registry
Switches tools.browser_tool's cloud-provider lookup from the hardcoded
_PROVIDER_REGISTRY class-instantiation pattern to the
agent.browser_registry singleton registry that plugins self-populate.

Changes:

- tools/browser_tool.py top imports: pull BrowserProvider from
  agent.browser_provider (re-exported as CloudBrowserProvider for legacy
  callers) and the three provider classes from plugins/browser/<vendor>/.
  Legacy class names (BrowserbaseProvider, BrowserUseProvider, FirecrawlProvider)
  remain on tools.browser_tool as re-export shims so existing test patches
  (monkeypatch.setattr(browser_tool, 'BrowserUseProvider', ...)) keep working.

- _get_cloud_provider() now consults agent.browser_registry.get_provider()
  for explicit-config lookups. The auto-detect fallback still uses
  BrowserUseProvider() / BrowserbaseProvider() at the module level so the
  cache-policy test fixtures (which patch those names) keep driving the
  function. Test-time _PROVIDER_REGISTRY overrides are detected by class
  identity and routed through the legacy factory-call path.

- agent/browser_provider.py: BrowserProvider grows is_configured() and
  provider_name() as thin backward-compat aliases for the legacy
  CloudBrowserProvider API. Subclasses MUST implement is_available() and
  name; the aliases delegate. This keeps ~6 caller sites in browser_tool.py
  working without churning them.

- tests/tools/test_managed_browserbase_and_modal.py: _install_fake_tools_package
  grows stubs for agent.browser_provider / agent.browser_registry /
  plugins.browser.<vendor>.provider so the test's spec-loader path
  (sys.modules-reset + reload-tool-from-disk) can satisfy tools.browser_tool's
  top-level imports.

Verified: all 23 existing tests in test_browser_cloud_*.py +
test_managed_browserbase_and_modal.py still pass post-cutover.

The legacy tools/browser_providers/ directory is NOT yet deleted; several
tests still _load_tool_module() those files via spec_from_file_location.
The deletion + test-path updates land in a later commit.
2026-05-17 04:04:15 -07:00
alt-glitch
c57709a3d6 feat: wire ensure_dependency into TUI and browser tool call sites
Before: missing node → hard exit; missing browser → FileNotFoundError.
After: both try ensure_dependency() first, which prompts interactively
and delegates installation to install.sh --ensure.

ripgrep and ffmpeg already degrade gracefully (grep fallback, skip
conversion) so they don't need wiring.

Also documents the design rationale in dep_ensure.py: detection and
prompting live in Python (portable, instant, UX-integrated); only
the actual installation delegates to install.sh (1900 lines of
battle-tested OS/package-manager logic).
2026-05-15 14:45:43 -07:00
alt-glitch
259ae846c8 feat: add ensure_dependency() wrapper + ship install.sh in wheel
Includes paired change: browser tool now searches ~/.hermes/node_modules/.bin/
for agent-browser installed via install.sh --ensure browser.
2026-05-15 14:45:43 -07:00