Commit graph

14207 commits

Author SHA1 Message Date
Morgan K
39bff67957 feat(gateway): add 'log' option to display.tool_progress
Salvage of #3459 by @keslerm, reimplemented against the restructured
progress-callback block in gateway/run.py (resolve_display_setting,
needs_progress_queue, thinking-relay). Duplicate PR #3458 by @dlkakbs was
submitted 4 minutes earlier with the same feature — both credited.

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>

tool_progress: log keeps the chat silent and appends timestamped tool-call
lines to ~/.hermes/logs/tool_calls.log via a dedicated queue drained by an
async writer (RotatingFileHandler 5MB x 3, RedactingFormatter so secrets
never land on disk). Gateway-only by design; thinking_progress relaying and
the webhook gate are unaffected. /verbose now cycles
off -> new -> all -> verbose -> log.
2026-07-02 05:09:38 -07:00
Mibayy
070ac2a719 fix(status): label provider as custom when config.yaml model.base_url is set
Salvage of the surviving hunk of #3296 by @Mibayy. The PR's gateway
_handle_provider_command hunk targets code removed on main (/provider was
absorbed into /model + /status, which already read model.base_url); the
hermes status mislabel was the remaining live symptom:
_effective_provider_label() only checked the legacy OPENAI_BASE_URL env var,
so a custom endpoint configured canonically in config.yaml still displayed
as OpenRouter.
2026-07-02 04:59:02 -07:00
Teknium
44650a5ce3 chore: add AUTHOR_MAP entry for @ajmeese7 (#3219 salvage) 2026-07-02 04:48:02 -07:00
Roger Smith
c0d694a492 fix(whatsapp): resolve LID sender IDs to phone numbers in bridge message payload
WhatsApp has migrated to Linked Identity Device (LID) format for user
IDs (e.g. 244645917392975@lid instead of 18505551234@s.whatsapp.net).

The bridge already resolves LIDs to phone numbers for its own allowlist
check via buildLidMap(), but the senderId field in the message payload
sent to the gateway still contained the raw LID. This caused the
gateway's WHATSAPP_ALLOWED_USERS check to reject all messages as
unauthorized, since the LID numbers don't match the phone numbers in
the allowlist.

Fix: resolve LID → phone in the senderId, senderName, and chatName
fields of the event payload before sending to the gateway, using the
existing lidToPhone mapping.
2026-07-02 04:48:02 -07:00
kshitijk4poor
019950560d refactor(image-gen): reuse shared image sniffer + raster allowlist in codex backend
Replace the plugin-local _IMAGE_MAGIC_MIME table + _sniff_image_mime
body with a delegation to agent.image_routing._sniff_mime_from_bytes,
the canonical magic-byte sniffer already used across the codebase, then
gate its result to the raster formats gpt-image-2's Responses
input_image actually accepts (png/jpeg/gif/webp).

The shared sniffer also recognizes SVG/TIFF/ICO; without the allowlist
those would pass local validation and be rejected server-side with an
opaque HTTP 400. Gating locally fails them cleanly as invalid_image_input.
Adds a regression test for SVG rejection.

Follow-up on top of @CrazyBoyM's #55828.
2026-07-02 17:12:24 +05:30
CrazyBoyM
460235d584 test(image-gen): cap Codex reference inputs 2026-07-02 17:12:24 +05:30
CrazyBoyM
ecffd290a3 feat(image-gen): support Codex image inputs 2026-07-02 17:12:24 +05:30
Evo
a4a562ff0c fix(browser): guard Camofox snapshot/vision/images on private pages
Follow-up to #56874, which added the Camofox private-page SSRF guard
(_camofox_current_page_private_url) but wired it only into the Camofox
eval path (_camofox_eval). The other Camofox content-read tools —
camofox_snapshot, camofox_get_images, and camofox_vision — still read the
current page's accessibility tree / images / screenshot without the
guard, so on a non-local Camofox backend they can return the content of
an intranet or cloud-metadata page (e.g. 169.254.169.254) that the
terminal itself can't reach.

Apply the same guard, gated on _eval_ssrf_guard_active (non-local
backend, not a local sidecar, allow_private_urls unset) and fail-open on
probe failure, matching the eval-path guard and the main-browser
snapshot/vision guards. camofox_back is intentionally not changed: its
target is unknown until navigation completes, and the subsequent content
read is already guarded.

Adds regression tests covering the three read tools blocking on a private
page, the public-page pass-through, and the guard-inactive no-probe path.
2026-07-02 17:07:17 +05:30
kshitijk4poor
0a2d4a6eea docs(codex): clarify stale-floor docstring reflects the 10k gate
The helper docstring described the typical ~15-25k gateway payload but
read as if that were the trigger range; the floor actually engages above
10k tokens. Clarify the prose to match the gate.
2026-07-02 17:05:05 +05:30
HexLab98
ede4d12561 test(codex): cover gateway-scale stale timeout floor and TTFB gate 2026-07-02 17:05:05 +05:30
HexLab98
cb1ccc57e6 fix(codex): extend stale timeout for gateway-scale tool payloads
Lower the openai-codex stale-timeout floor from 25k to 10k estimated
tokens so Telegram/gateway sessions (~20k tools+instructions) are not
aborted at the generic 90s cutoff while Codex is still prefilling.
2026-07-02 17:05:05 +05:30
kshitij
d733eaa650
Merge pull request #57007 from kshitijk4poor/chore/author-map-crazyboym-55828
chore(release): map ai-lab@foxmail.com to CrazyBoyM
2026-07-02 17:03:19 +05:30
kshitijk4poor
be21e06ab3 chore(release): map ai-lab@foxmail.com to CrazyBoyM
Adds the AUTHOR_MAP entry for CrazyBoyM (ai-lab@foxmail.com) so the
contributor-attribution CI check passes when PR #55828's commits are
rebase-merged with authorship preserved.
2026-07-02 16:55:02 +05:30
Teknium
3f2a56d1a4
fix(cli): reliable interrupts, bounded exit, and exit feedback (#57000)
Three CLI reliability fixes:

1. Interrupt reliability: chat() only re-queued the user's interrupt
   message when the turn result carried interrupted=True. When the agent
   thread raced past its last interrupt check (or finished) before the
   interrupt landed, the message was silently dropped — and the stale
   _interrupt_requested flag left on the agent instantly aborted the
   NEXT turn. Un-acknowledged interrupt messages are now re-queued as
   the next turn and the stale flag is cleared (only when the agent
   thread actually exited). The clarify-race path also parks the message
   in _pending_input instead of dropping it.

2. Slow exit (5+ min): stdlib ThreadPoolExecutor workers are non-daemon
   and joined unconditionally by concurrent.futures' atexit hook — even
   after shutdown(wait=False). One wedged tool worker (abandoned after
   interrupt/timeout) held the process open forever. Promoted
   async_delegation's daemon executor to a shared tools/daemon_pool
   module and adopted it in tool_executor (concurrent tool batches),
   memory_manager (background sync), delegate_tool (child timeout wrapper
   + batch fan-out), and skills_hub (source fan-out). Added a 30s exit
   watchdog (HERMES_EXIT_WATCHDOG_S) armed at _run_cleanup start as a
   backstop for wedged cleanup steps.

3. Exit jank: after prompt_toolkit tears down the input/status bars the
   terminal sat silent for the whole cleanup window, looking hung. Print
   'Shutting down… (finalizing session)' immediately at exit start.

E2E: live PTY interrupt of a foreground 'sleep 120' terminal tool now
aborts in ~1s and the typed message runs as the next turn; wedged-worker
+ wedged-cleanup subprocess exits in 5.8s (watchdog) instead of hanging.
2026-07-02 04:20:43 -07:00
Tarun Ravikumar
2068754d6f feat(api-server): inline MEDIA: image tags as base64 data URLs for remote frontends
Salvage of the surviving piece of #2696 by @tarunravi. The PR's other two
changes (tool progress streaming, SSE None-sentinel fix) were independently
superseded on main by the structured hermes.tool.progress SSE events and the
rewritten queue-drain loop.

Remote OpenAI-compatible frontends can't read server-local file paths, so
MEDIA:<path> tags (browser screenshots, generated images) were dead text.
_resolve_media_to_data_urls() now inlines small (<=5MB) local images as
markdown data URLs across all four response surfaces: chat completions
(non-streaming), session chat, session chat stream final event, and the
Responses API. Non-image, missing, or oversized paths pass through
untouched.
2026-07-02 03:23:44 -07:00
CharmingGroot
88bd1c01e1 fix(email): harden adapter against malformed IMAP responses
Salvage of #2794 by @CharmingGroot, ported to the relocated
plugins/platforms/email/adapter.py:

- Guard raw_email = msg_data[0][1] against IndexError/TypeError and
  non-bytes payloads. UIDs are added to _seen_uids before fetch, so an
  exception mid-batch permanently skipped every remaining message in
  the batch — now the bad message is logged and skipped instead.
- Message-ID domain generation falls back to 'localhost' when
  EMAIL_ADDRESS lacks '@' (now via a shared _message_id_domain() helper
  covering all 3 send paths; the PR fixed 2 of 3).
2026-07-02 03:12:53 -07:00
crazywriter1
c43aa6301d feat(gateway): per-channel model and system prompt overrides (Fixes #1955)
- ChannelOverride + channel_overrides; session /model > channel > global
- Thread/parent lookup; YAML bridge for discord.channel_overrides
- Guard channel_overrides when config lacks platforms (test mocks)
- Add sampiyonyus@gmail.com to AUTHOR_MAP
2026-07-02 03:08:11 -07:00
crazywriter1
0010c14e66 feat(gateway): per-channel model and system prompt overrides (Fixes #1955)
- ChannelOverride + channel_overrides on PlatformConfig
- Resolve model/runtime: session /model, then channel_overrides, then global
- Thread/parent channel lookup; bridge discord.channel_overrides from YAML
- Drop unrelated test and delegate_tool changes from PR scope
2026-07-02 03:08:11 -07:00
crazywriter1
ebef73f6b8 feat(gateway): per-channel model and system prompt overrides (Fixes #1955)
- config: ChannelOverride + PlatformConfig.channel_overrides

- run: _resolve_model_for_channel, _get_system_prompt_for_channel, channel provider runtime

- tests: channel overrides + config guard for bare runner; conftest asyncio fix; slack/whatsapp warning filters

Made-with: Cursor
2026-07-02 03:08:11 -07:00
Teknium
902b0b70e4 test: env-flag 'on' truthy behavior contract (#2863 follow-up) 2026-07-02 03:00:59 -07:00
aydnOktay
60039d5a3a fix(config): accept 'on' as truthy for env flags via shared env_var_enabled helper
Salvage of #2863 by @aydnOktay, reimplemented against current main using the
existing utils.env_var_enabled / TRUTHY_STRINGS helper instead of per-site
tuple edits. Covers the 7 gateway/config.py env-flag sites that still rejected
'on' (WHATSAPP_ENABLED, SIGNAL_IGNORE_STORIES, MATRIX_ENCRYPTION,
API_SERVER_ENABLED, WEBHOOK_ENABLED, MSGRAPH_WEBHOOK_ENABLED,
BLUEBUBBLES_SEND_READ_RECEIPTS) plus HERMES_DESKTOP gating in
read_terminal/close_terminal. The PR's approval.py HERMES_YOLO_MODE portion is
already on main via is_truthy_value.
2026-07-02 03:00:59 -07:00
Teknium
6546c58641 chore: add AUTHOR_MAP entry for @VolodymyrBg (#2861 salvage) 2026-07-02 03:00:17 -07:00
VolodymyrBg
bd4007396d fix(webhook): remove unused payload from delivery state 2026-07-02 03:00:17 -07:00
VolodymyrBg
ea5d75befd fix(webhook): remove unused payload from delivery state 2026-07-02 03:00:17 -07:00
Teknium
6e369a3762
feat(delegation): unify concurrency caps — deprecate max_async_children (#56955)
delegation.max_concurrent_children is now the single cap for both a
batch's parallelism and concurrent background delegation units.

- _get_max_async_children() delegates to _get_max_concurrent_children();
  a leftover max_async_children key logs a one-time deprecation warning
- config v32→33 migration removes the stale key, folding a raised
  max_async_children into max_concurrent_children (max wins, no lost
  headroom)
- capacity error messages now point at max_concurrent_children
- pool-at-capacity sync fallback now attaches an explanatory note so
  the model/user know why the call blocked instead of dispatching async

Previously users who raised max_concurrent_children (e.g. to 15) still
hit the invisible default-3 async cap: the 4th background delegate_task
silently ran inline, blocking the turn with no signal.
2026-07-02 02:53:39 -07:00
Teknium
14639ded77
fix(terminal): stop stripping CLAUDE_CODE_OAUTH_TOKEN from spawned subprocesses (#56935)
CLAUDE_CODE_OAUTH_TOKEN is set and owned by the user's Claude Code
install (subscription OAuth), not a Hermes-managed inference
credential — Claude subscription auth is not a working Hermes provider
path. Blocklisting it broke agent-spawned claude CLIs: with no token in
the child env, claude fell through to the shared macOS Keychain /
~/.claude/.credentials.json store and, on auth failure, cleared it —
logging the user out of their interactive Claude sessions and the
desktop app.

Exempt it from _HERMES_PROVIDER_ENV_BLOCKLIST (it arrives via the
anthropic registry entry, so discard explicitly with rationale).
ANTHROPIC_API_KEY / ANTHROPIC_TOKEN and every other provider credential
remain stripped, and the GHSA-rhgp-j443-p4rf fail-closed passthrough
guard is unchanged for everything still on the blocklist.

Fixes #55878
2026-07-02 02:13:30 -07:00
teknium1
8b1ad38ecb chore: add AUTHOR_MAP entry for @sahibzada-allahyar (#39227 salvage) 2026-07-02 01:58:19 -07:00
teknium1
36b7e5e9cc fix(desktop): guard configured-cwd override against active sessions
Follow-up to the #39227 salvage: config refreshes fire mid-session too
(gateway events, settings saves), so applying terminal.cwd
unconditionally would yank the workspace out from under an attached
session. Gate the override on activeSessionIdRef like the sibling
reasoning/tier settings, keep branch refresh on the live cwd, and add
coverage for the active-session path. Also lint-polish the new test
file (typed config mock, prettier formatting).
2026-07-02 01:58:19 -07:00
Sahibzada Allahyar
d2de9580e1 fix(desktop): prefer configured workspace cwd 2026-07-02 01:58:19 -07:00
kshitijk4poor
b837f07dcd fix(agent): route restore custom-pool match through canonical helper
Follow-up on the salvaged #56392 guard. The cherry-picked change matched
custom:<name> pool entries against the primary by raw base_url string
equality, which (a) can't disambiguate two named custom providers sharing
one gateway base_url and (b) left a latent bare-"custom" entry bypass.

Route the match through get_custom_provider_pool_key(rt[base_url]) compared
against the entry's custom:<name> key, mirroring the sibling guard in
recover_with_credential_pool. Use CUSTOM_POOL_PREFIX instead of the literal.

Add regression tests for the custom same-endpoint (swap) and cross-endpoint
(skip) branches, plus the plain-provider fallback-pool case from #56885.
2026-07-02 13:41:53 +05:30
openhands
820a052575 fix(agent): keep primary runtime restore on matching credential pool (#56374) 2026-07-02 13:41:53 +05:30
Teknium
fb403a3a73
fix(auxiliary): retry transient blips harder + isolate client cache per model (#56889)
Two related hardening fixes for auxiliary calls (which include MoA reference
advisors — a pinned-model path where provider fallback is not a meaningful
recovery):

1. Transient-transport retries: the same-provider retry on a connection reset /
   timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux
   call a second blip silently loses the call (root of the run2 double-advisor
   'Connection error' collapse — a genuine upstream blip). Now retries N times
   with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3
   total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out
   preserved.

2. Per-model client-cache isolation: _client_cache_key excluded the model, so
   two concurrent auxiliary calls to the same provider/base_url/key but
   different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry
   and could race each other's client lifecycle. Model now participates in the
   key -> distinct clients, no cross-call races. Same-model reuse unchanged.

- agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in
  _client_cache_key and both call sites.
- hermes_cli/config.py: auxiliary.transient_retries default (2).
- tests: new retry/isolation tests; updated 2 stale-expectation tests to the
  corrected behavior (per-model resolve; N-retry escalation).

Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.
2026-07-02 01:09:37 -07:00
kshitijk4poor
71c0622122 chore(release): map kiljadn@gmail.com to designnotdrum for #56480 salvage
Attribution audit gate: the salvaged contributor commits carry
kiljadn@gmail.com (Nick Mason / @designnotdrum). Add the mapping so
contributor_audit.py resolves the author on this PR.
2026-07-02 13:25:25 +05:30
kshitijk4poor
46273a55a8 docs(toolsets): clarify get_toolset static-view returns None for registry-derived aliases too
Follow-up on the #56480 salvage: the include_registry=False docstring said
None is returned only for registry/MCP-only toolsets; it also applies to
registry-derived aliases, which have no static TOOLSETS counterpart.
2026-07-02 13:25:25 +05:30
Nick Mason
80733413f9 fix(tools): don't drop a toolset from platform inference when a tool is registered into it
_get_platform_tools reverse-maps a platform composite to configurable
toolsets with an all-tools subset test. Because get_toolset() merges
registry-registered tools into a toolset, a tool added to a toolset
(delegate_cli -> delegation; desktop-only read_terminal -> terminal) that the
static composite never listed made the subset test fail, silently dropping the
entire toolset on api_server and other inference-based platforms. Compare the
toolset's static membership at all three reverse-map sites.

Fixes #49622.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 13:25:25 +05:30
Nick Mason
5317993a6d fix(tools): expose static (pre-registry-merge) toolset view for platform inference
Adds include_registry=True kwarg to resolve_toolset/get_toolset. When False,
returns only the static TOOLSETS view with no registry-merged tools — the
composite-authored membership platform reverse-mapping must compare against.
Default True preserves all existing behavior; this is the enabling half of
the api_server toolset-drop fix (#49622).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 13:25:25 +05:30
Ray
6a58badfdc fix(browser): guard Camofox eval private pages
Extends the browser private-network eval guard to the Camofox backend.
On main, _browser_eval() returned early in Camofox mode before running the
shared private-URL literal pre-scan and before re-checking the page URL
after eval, leaving Camofox as a sibling backend that could execute
browser_console(expression=...) against private/internal targets.

- move the eval private-URL literal pre-scan before the Camofox early return
- add a Camofox current-page private-URL probe via the evaluate endpoint
- withhold Camofox eval results when the page is now private/internal

Follow-up to browser private-network hardening in #56173, #56526, #56664.

Salvage of #56764 by @rayjun (rayoo), cherry-picked to preserve authorship.
2026-07-02 13:10:30 +05:30
kshitijk4poor
f2b8a5d541 test(gateway): assert _record_gateway_session_peer fires only on the persisted split
The fake _SessionStore tracked peer_records but no test read it, leaving
#55300's peer-record behavior unasserted. Add a positive assertion on the
persist path and negative (== []) assertions on the two stale/moved-binding
skip paths, so the peer-record side effect is bound.

Mutation-verified: removing the production _record_gateway_session_peer call
makes the positive assertion fail.

Co-authored-by: João Vitor Cunha <jvsantos.cunha@gmail.com>
2026-07-02 12:49:42 +05:30
kshitijk4poor
ed6f80a20c test(gateway): align fake SessionStore with _record_gateway_session_peer
The #55300 peer-recording call now fires on the failed-turn compression
split path; the fake _SessionStore in test_compression_failure_session_sync
(carried in with #55721's test changes) lacked that method. Add a
call-tracking no-op so the combined salvage's tests pass.

Co-authored-by: João Vitor Cunha <jvsantos.cunha@gmail.com>
2026-07-02 12:49:42 +05:30
r266-tech
2a04137322 fix(gateway): preserve platform + gateway_session_key on /compress temp agent
Manual /compress built a temporary AIAgent without the originating
platform / stable gateway session key, so an external context engine
ingested the retained transcript tail as source=cli during /compress
and again as the real platform on resume (duplicate cli,telegram rows).
Pass platform=_platform_config_key(source.platform) + the in-scope
gateway_session_key, mirroring the normal gateway turn. Assigned into
runtime_kwargs (single-valued, authoritative) so they neither collide
into a duplicate-kwarg TypeError nor lose to a stale resolver value.

Fixes #50422.
2026-07-02 12:49:42 +05:30
Jake Present
00ec3b1884 fix(gateway): ignore stale compression session splits 2026-07-02 12:49:42 +05:30
João Vitor Cunha
d5b4879d4a fix(gateway): preserve peer routing across compression recovery 2026-07-02 12:49:42 +05:30
kshitijk4poor
e2ffbf0cf4 chore(release): add AUTHOR_MAP entries for compression-routing salvage
Map the two contributor emails whose commits are cherry-picked into the
compression-routing-integrity salvage so scripts/contributor_audit.py
attributes them at release time:

- jvsantos.cunha@gmail.com -> plcunha (PR #55300)
- jakepresent1@gmail.com   -> jakepresent (PR #55721)

r266-tech (PR #50517) is already mapped.
2026-07-02 12:49:42 +05:30
Teknium
543d305bbb
feat(moa): add reference_max_tokens to cap advisor output and cut turn latency (#56756)
MoA per-turn latency is dominated by advisor GENERATION: turn wall time
correlates ~0.88 with output tokens and ~-0.03 with input tokens (measured over
52 turns). Each turn waits for the slowest advisor to finish writing, and
advisors were uncapped — writing multi-thousand-token essays the aggregator
only needs the gist of.

Add an opt-in per-preset reference_max_tokens knob (mirrors reference_temperature)
that caps ADVISOR output only; the acting aggregator is never capped. Default
None = uncapped, so existing presets are byte-for-byte unchanged (no regression).
Wired through both MoA execution paths (MoAChatCompletions.create and
aggregate_moa_context).

E2E: same task, closed preset uncapped vs reference_max_tokens=600 -> 59s to 33s
(~44% faster), final answer identical/correct.

- hermes_cli/moa_config.py: _coerce_int_or_none helper + reference_max_tokens
  in _normalize_preset/_default_preset/flattened view
- agent/moa_loop.py: read preset.reference_max_tokens, pass to reference fan-out
- agent/conversation_loop.py: pass reference_max_tokens on the per-turn path
- tests + docs
2026-07-02 00:16:35 -07:00
Ben Barclay
9be39de0f2
fix(auth): make HERMES_PORTAL_BASE_URL/NOUS_PORTAL_BASE_URL bypass the Portal host allowlist (#56864)
Ben caught that the initial approach (widening _NOUS_PORTAL_ALLOWED_HOSTS to
include the staging host) was the wrong fix -- env vars are supposed to
override the allowlist, mirroring how NOUS_INFERENCE_BASE_URL already
bypasses _ALLOWED_NOUS_INFERENCE_HOSTS via _nous_inference_env_override().

The actual bug: both resolve_nous_access_token and
resolve_nous_runtime_credentials read
`_optional_base_url(state.get("portal_base_url")) or os.getenv(...) or ...`
-- a plain `or` chain where the STORED state value wins first (short-circuits
before the env vars are even read), and then whichever value won gets run
through the same _NOUS_PORTAL_ALLOWED_HOSTS gate regardless of its source.
So a hosted agent stamped with HERMES_PORTAL_BASE_URL=<staging> in its env
AND a staging portal_base_url already persisted to auth.json would still
get silently rewritten to prod on every refresh, because the env var never
even got a chance to be consulted.

Revert the previous _NOUS_PORTAL_ALLOWED_HOSTS widening entirely --
staying prod-only preserves the allowlist's actual job (rejecting an
untrusted network-provided portal_base_url persisted to auth.json by a
compromised Portal response).

Add _nous_portal_env_override() (mirrors _nous_inference_env_override())
and restructure both call sites so the env override is checked FIRST and,
when set, wins outright and skips the allowlist gate entirely -- the
allowlist only ever runs against the fallback (stored-state-or-default)
path now.

Rewrote tests/hermes_cli/test_nous_portal_staging_allowlist.py to test the
actual fix: the helper function, and an end-to-end
resolve_nous_access_token proof that the env override wins even when state
ALSO has the staging host stored (the exact incident shape), that it wins
over a stored PROD host too, and that the allowlist's heal-to-prod
behaviour for an untrusted stored value is preserved when no override is
set.
2026-07-02 06:52:46 +00:00
kshitij
88d1d6206f
fix(streaming): handle completed responses with empty/None choices (#55933) (#56713)
* fix(streaming): handle completed responses with empty/None choices

The streaming fallback guard added in #55932 recognized a completed
response object only when its `choices` was a non-empty list. But an
adapter can return a completed response whose `choices` is `None` or an
empty list (an error / content-filter / terminal frame) — still a whole,
non-iterable response, not a token stream. Those shapes fell through to
`for chunk in stream` and crashed with

    'types.SimpleNamespace' object is not iterable

which is exactly issue #55933 (MoA `openai-codex` aggregator on
TUI/Desktop, where a stream consumer forces the streaming path).

Broaden the guard to discriminate on the PRESENCE of a `choices`
attribute (a genuine provider Stream object exposes none), disable
streaming for the session, and return the completed object so the outer
loop's normal invalid-response validation handles empty/None choices via
its retry path instead of iterating.

Based on the diagnosis in #56525 by @spiky02plateau (that PR normalized
the MoA aggregator return with a one-shot chunk iterator; the common
text/tool-call crash was already fixed at this seam by #55932, so this
extends the existing guard to cover only the remaining empty/None-choices
gap).

Fixes #55933

* refactor(streaming): simplify empty-choices guard body and parametrize tests

Post-review cleanup (no behavior change):
- Inline the single-use `response_choices` local and drop the redundant
  `if first_choice is not None else None` guard (getattr(None, ...) already
  returns the default safely).
- Collapse the two near-identical empty/None-choices regression tests into
  one `@pytest.mark.parametrize` case.

Mutation-verified: reverting the guard to the old non-empty-list condition
still makes both parametrized cases fail with the historical
'types.SimpleNamespace' object is not iterable.

---------

Co-authored-by: spiky02plateau <155588579+spiky02plateau@users.noreply.github.com>
2026-07-02 06:36:20 +05:30
kshitijk4poor
76be770091 test(moa): assert aux cap against model resolver, not frozen literal
Follow-up to the salvaged fix: the regression test asserted a frozen
max_tokens == 128_000 literal, coupling it to the Opus-4-8 model table.
Assert against _get_anthropic_max_output("claude-opus-4-8") plus > 2000
instead, so the test survives model-table churn while still catching a
regression to the old `or 2000` fallback.
2026-07-02 06:31:18 +05:30
helix4u
7951250947 fix(moa): lift hidden Anthropic aux output cap 2026-07-02 06:31:18 +05:30
kshitij
4d5d9fffd0
Merge pull request #56582 from srojk34/fix/vertex-credentials-env-leak
security(terminal): strip VERTEX_CREDENTIALS_PATH/GOOGLE_APPLICATION_CREDENTIALS from subprocess env
2026-07-02 06:08:55 +05:30
srojk34
7f64cce96d security(vertex): route credential/project/region resolution through the profile secret scope
agent/vertex_adapter.py resolved VERTEX_CREDENTIALS_PATH,
GOOGLE_APPLICATION_CREDENTIALS, VERTEX_PROJECT_ID, and VERTEX_REGION via raw
os.environ.get() instead of the profile-scoped get_secret() every other
credential lookup in hermes_cli/runtime_provider.py uses. In a multiplex
gateway serving several profiles from one process, os.environ still holds
whichever profile's .env python-dotenv loaded at boot — so a raw read here
let one profile's turn silently mint a Vertex OAuth2 token from, and get
billed against, a different profile's GCP service account. No error, no
fail-closed guard: the multiplex UnscopedSecretError protection was bypassed
entirely because these reads never went through get_secret().

- _resolve_credentials_path/_resolve_project_override/_resolve_region now
  call agent.secret_scope.get_secret(), matching the _getenv() pattern
  already used for every other provider's credentials.
- get_vertex_credentials()'s ADC fallback (google.auth.default()) reads
  GOOGLE_APPLICATION_CREDENTIALS from os.environ internally, bypassing
  get_secret() entirely — closed with a narrow guard: when multiplexing is
  active and this profile's scope has no Vertex credentials of its own, but
  os.environ still carries a value (left by a different profile's boot-time
  dotenv load), refuse ADC rather than silently authenticate as a stranger.
- Zero behavior change for single-profile installs: get_secret() falls
  through to os.environ transparently whenever multiplexing is off.

Same bug class as the already-fixed _HERMES_OAUTH_FILE/_AUTH_JSON_PATH/
HOOKS_DIR cross-profile leaks, now closed for Vertex's OAuth2 credential
path.
2026-07-02 06:07:56 +05:30