Commit graph

14011 commits

Author SHA1 Message Date
kshitijk4poor
daf4f1a7a9 fix(tools): close the same session leak on the hermes_subprocess_env spawn surface (review)
Review of the #50531 salvage found the cross-session HERMES_SESSION_* leak also
survives on the non-terminal spawn helper hermes_subprocess_env (added by #56202
after #50531 was written), which does os.environ.copy() without the guard. Of
its six callers, five re-bind the session identity explicitly (slash_worker/ACP
via --session-key argv) and are safe by accident; but tui_gateway cli.exec
(server.py) spawns a fresh CLI with NO --session-key under the engaged TUI host,
so it inherits a possibly-foreign HERMES_SESSION_* from the last-writer-wins
global and would stamp Kanban rows / telemetry with another session's id.

Route hermes_subprocess_env through the same _inject_session_context_env
chokepoint, restoring the single-uniform-policy-across-every-spawn-surface
invariant the codebase already claims for the internal-secret filter. Safe for
all six callers: bound ContextVars win (re-binders unaffected), _UNSET strips
(closes cli.exec). Adds 3 guard tests; mutation-checked.
2026-07-01 15:42:19 +05:30
PolyphonyRequiem
cc395e8050 fix(gateway): close cross-session HERMES_SESSION_* leak into subprocess env
Session vars (HERMES_SESSION_*) have a process-global os.environ mirror written
last-writer-wins as a CLI/cron fallback and never cleared. Under a concurrent
multi-session host (messaging gateway, ACP adapter, API server, TUI) that global
belongs to whichever turn wrote it last. A subprocess spawned from a task whose
session ContextVar is _UNSET (a sibling task that never bound, or one that
inherited another session's context) inherited the FOREIGN global and acted on
another session's identity.

Add a session_context_engaged() latch (set once any host calls set_session_vars)
and route both terminal spawn paths through a single _inject_session_context_env
chokepoint: once engaged, a bound ContextVar (incl. "") is authoritative and an
_UNSET var is STRIPPED rather than inheriting the possibly-foreign global. Pure
single-process CLI/one-shot (never engaged) keeps the inherited fallback.

Salvaged from #50531 (supersedes #49922). local.py hunk re-applied by intent
onto the current hermes_subprocess_env refactor.

Co-authored-by: PolyphonyRequiem <3107779+PolyphonyRequiem@users.noreply.github.com>
2026-07-01 15:42:19 +05:30
kshitijk4poor
e3819a4143 test(anthropic): add adjacency behavior test for #52145 + fix vacuous refresh-UA test (review)
Review follow-up on the anthropic_adapter batch salvage:

1. #52145 shipped no behavior test for the adjacency rewrite. Add
   test_strips_tool_use_when_result_not_immediately_adjacent (a tool_use whose
   result appears later but NOT in the immediately-following user message must
   be stripped — the exact case the old global id-match got wrong) plus an
   adjacent-pair control. Mutation-checked: reverting to a global match fails
   the non-adjacent test.

2. test_token_refresh_ua_prefix was vacuous — it bound to _refresh_oauth_token
   (a wrapper with no urllib.request.Request), so its assert never ran and it
   did NOT guard the real refresh UA site. Retarget it at
   refresh_anthropic_oauth_pure (:1048) with the header-scoped check. Mutation-
   checked: reverting :1048 to claude-cli/ now fails it.
2026-07-01 15:42:15 +05:30
kshitijk4poor
5efbd7cb05 test(anthropic): scope OAuth-UA source check to header lines, not any mention
The salvaged test_token_exchange_ua_prefix did a naive whole-function substring
check for 'claude-cli/', which false-positives on an explanatory comment that
references the old (blocked) UA. Scope it to actual User-Agent header lines —
mirroring the sibling test_no_claude_cli_in_source — so a comment documenting
why claude-cli/ is avoided doesn't trip it. Mutation-checked: an actual
claude-cli/ UA header still fails the test.
2026-07-01 15:42:15 +05:30
DhivinX
49e129e495 fix(anthropic): use claude-code/ UA prefix for OAuth to avoid 404 (#48534)
Anthropic's OAuth endpoints 404 for the claude-cli/ User-Agent prefix. Switch
all three OAuth UA sites (build_anthropic_client, refresh_anthropic_oauth_pure,
run_hermes_oauth_login_pure) to the claude-code/ prefix Anthropic expects.

Salvaged from #51948.

Co-authored-by: DhivinX <20087092+DhivinX@users.noreply.github.com>
2026-07-01 15:42:15 +05:30
fsaad1984
5881791adc fix(adapter): enforce tool_use/tool_result adjacency in _strip_orphaned_tool_blocks
_strip_orphaned_tool_blocks collected tool_result ids across ALL user messages
and kept any assistant tool_use whose id appeared anywhere, rather than
requiring the result to be in the immediately-following user message. A stale
match elsewhere in the transcript could keep a genuinely-orphaned tool_use,
which Anthropic rejects. Rewrite to adjacency-checked two-pass logic so a
tool_use is kept only when its result immediately follows.

Salvaged from #52145.

Co-authored-by: fsaad1984 <38867992+fsaad1984@users.noreply.github.com>
2026-07-01 15:42:15 +05:30
kshitijk4poor
ede5c09f3b docs(disk-cleanup): clarify cron output-root protection is exact-match
Review follow-up: the _is_protected_cron_path docstring listed output/ next
to jobs.json/.tick.lock as 'the directory itself', which is slightly
ambiguous. Spell out that the match is EXACT-path only and must not be
'simplified' into a blanket cron/output/* guard (children stay cleanable) —
prevents a future editor from re-introducing the wholesale-delete bug this
fix closes.
2026-07-01 15:42:04 +05:30
martinramos002-bot
d173e8c3a7 fix: protect cron output root from cleanup
Only classify files below cron/output/ as disposable cron output.
The cron/output directory itself is a durable container for retained
job history and should not be tracked or deleted wholesale.

Add regression coverage for both category detection and cleanup of a
stale tracked entry pointing at the output root.
2026-07-01 15:42:04 +05:30
kshitijk4poor
7f71a48a3a fix(cron): release TERMINAL_CWD lock even when run_job body raises
Rework follow-up on the per-job TERMINAL_CWD readers-writer lock.

The lock was acquired BEFORE the try: whose finally: is the only release
site, with the env-override statements (os.environ[TERMINAL_CWD] = workdir;
logger.info) sitting in the unprotected window between acquire and try. Any
exception there — a raising log handler, an os.environ error, a thread
interrupt — propagated out of run_job WITHOUT running the finally, leaking
the lock. A leaked writer permanently deadlocks the whole scheduler (every
future cron job blocks on acquire_*); a leaked reader blocks all writers.

- Snapshot _prior_terminal_cwd before the acquire (so the finally can always
  restore env even if the body raises before the override).
- Open the try: immediately after acquire and move the env-override lines
  inside it, so the existing finally always releases the lock.
- Add a mutation-verified regression test: a workdir job whose in-window
  logger.info raises must still release the writer lock (a subsequent
  acquire_write must not block).
2026-07-01 15:39:48 +05:30
entropy-0x
abc349bd79 fix(cron): isolate per-job TERMINAL_CWD from concurrent cron jobs
A cron job with a per-job `workdir` overrides the process-global
`os.environ["TERMINAL_CWD"]` for the entire duration of its agent run and
restores it afterwards. The scheduler dispatches workdir jobs on a
single-thread sequential pool and workdir-less jobs on a separate parallel
pool, and the in-code comments claimed this made the override safe.

That only prevents two workdir jobs from overlapping each other. The two
pools run concurrently in the same process and share `os.environ`, so while
a workdir job has `TERMINAL_CWD` pointed at its project directory, any
workdir-less job firing in the same window reads that same global through the
terminal, file, and code-exec tools and runs its commands in the wrong
directory. The corruption window spans the whole workdir-job run, and a file
write or delete can land in another job's tree.

This serializes the override with a writer-preferring readers-writer lock.
Workdir jobs acquire it as writers (exclusive for their whole run); workdir-
less jobs acquire it as readers, so they still run in parallel with each
other but never alongside a workdir job's override. The guarantee is based on
run overlap rather than tick boundaries, so it also holds when a workdir job
spans ticks.

## What does this PR do?

Fixes a directory-isolation bug in the cron scheduler: a workdir cron job's
process-global `TERMINAL_CWD` override could be observed by a concurrently
running workdir-less cron job, causing that job's shell/file/code-exec
commands to execute in the wrong directory.

## Related Issue

N/A

## Type of Change

- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ]  New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ]  Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)

## Changes Made

- `cron/scheduler.py`: add `_ReadWriteLock` (writer-preferring) and the
  module-global `_terminal_cwd_lock`.
- `cron/scheduler.py`: in `run_job`, acquire the lock as a writer for workdir
  jobs and as a reader for workdir-less jobs, spanning the `TERMINAL_CWD`
  override and its restore in the `finally` block.
- `cron/scheduler.py`: correct the stale comments in `run_job` and `tick` that
  claimed the sequential pool alone made the override safe.
- `tests/cron/test_terminal_cwd_lock.py`: new tests for reader concurrency,
  writer exclusion, and the no-cross-observation regression.

## How to Test

1. `python -m pytest tests/cron/test_terminal_cwd_lock.py -q` — the regression
   test `test_reader_never_observes_writer_override` fails without the lock and
   passes with it.
2. `python -m pytest tests/cron/test_cron_workdir.py tests/cron/test_parallel_pool.py -q`
   — confirms the existing `TERMINAL_CWD` set/restore and pool behaviour are
   unchanged.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix
- [x] I've run the affected `tests/cron/` suites and all tests pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (docstrings/comments) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture — N/A
- [x] I've considered cross-platform impact (Windows, macOS) — uses stdlib `threading` only
- [x] I've updated tool descriptions/schemas if I changed tool behavior — N/A
2026-07-01 15:39:48 +05:30
srojk34
db0fd8f290 fix(security): use caller package root for deregister opt-in policy lookup
_plugin_override_policy is keyed by the plugin package root
(e.g. hermes_plugins.allowed), but the lookup used caller_mod
(the exact leaf module string). A call from hermes_plugins.allowed.cleanup
would evaluate _plugin_override_policy.get("hermes_plugins.allowed.cleanup")
→ False and raise PermissionError even when the plugin registered opt-in
under its package root.

Switch the policy lookup to caller_root (.join of the first two segments)
so submodule callers inherit the package-level allow_tool_override grant.

Adds a focused regression test for the opted-in submodule case.
2026-07-01 15:37:58 +05:30
testingbuddies24
e07768a53f fix(gateway): strip orphan think-tag close tags in progressive stream
When a model emits an inline <think>...</think> block but the opening
tag is dropped upstream (thinking-mode toggle, truncated stream, or
incomplete upstream filtering), the bare </think> close tag leaked
through to the user in the live progressive edit. The agent-side final
scrubber (agent/think_scrubber.py) already had _strip_orphan_close_tags;
this ports the same logic into GatewayStreamConsumer so the streaming
display stays clean too.

- _filter_and_accumulate: strip orphan close tags before appending the
  'no-opening-tag' branch text to _accumulated.
- _flush_think_buffer: same on stream end for held-back partials.
- 14 regression tests (TestStripOrphanCloseTags): all 6 close-tag
  variants, multi-tag, partial-tag-untouched, trailing whitespace,
  and end-to-end through _filter_and_accumulate / _flush_think_buffer.

Only strips KNOWN close-tag names (case-insensitive) — never arbitrary
tag-shaped substrings — so comparison operators and unrelated prose are
preserved.

Salvaged from PR #43192 by @testingbuddies24.
2026-07-01 03:04:01 -07:00
amathxbt
6a6fd42111 fix(security): block subshell/brace-group wrappers at the hardline floor
Wrapping a catastrophic command in a bare subshell or brace group walked
straight past the unconditional hardline floor -- even under --yolo,
/yolo, approvals.mode=off, and cron approve mode. The command-substitution
forms were already caught; the bare paren / brace-group forms were the gap.

Rather than add the paren and brace openers to the flat _CMDPOS pattern
class (which cannot tell a real subshell opener from one sitting inside a
quoted argument, and would false-positive on ordinary prose such as a PR
title that merely mentions the trigger word), teach the existing
QUOTE-AWARE command-start tokenizer (_iter_shell_command_starts) to treat
the paren and brace openers as command starts, then emit a detection
variant that marks each real command start with a newline (already a
_CMDPOS separator). Openers inside quotes never register as starts, so
quoted arguments are left untouched while real subshell/brace bypasses now
anchor. One place covers every _CMDPOS rule (shutdown/reboot/init/
systemctl/telinit and the rm root/home/system floor).

Tests: subshell/brace bypasses added to the hardline-block, root-wipe, and
yolo-bypass sets; a regression set asserts quoted paren/brace prose is NOT
blocked (guards our own gh-pr-create workflow).
2026-07-01 03:03:05 -07:00
teknium1
6d1291f2cc chore(deps): bump aiohttp to patched 3.14.1 (from 3.14.0)
3.14.1 is the current patched release on the 3.14 line; both CVE-2026-34993
(CookieJar.load RCE) and CVE-2026-47265 (per-request cookie leak on
cross-origin redirect) are fixed as of 3.14.0, and 3.14.1 rolls up the
subsequent point fixes. Re-locked uv.lock.
2026-07-01 02:51:45 -07:00
Wing Huang
6c37b2c785 security(deps): enforce aiohttp CVE floor on all lazy messaging paths + coverage guard
The messaging extra and platform.slack pin aiohttp==3.14.0, but several
lazy messaging features listed only their SDK and let aiohttp come in
transitively. Each of those SDKs caps aiohttp loosely enough that a
vulnerable already-installed aiohttp still satisfies the range, so the
eager extras got the patched floor while the lazy paths did not:

  - discord.py (aiohttp>=3.7.4,<4)
  - mautrix / aiohttp-socks (aiohttp>=3,<4 / aiohttp>=3.10.0)  [Matrix]
  - microsoft-teams-apps (aiohttp<4)                            [Teams]

(Teams additionally shipped an explicit but *stale* aiohttp==3.13.4 in
both the pyproject `teams` extra and platform.teams.)

- tools/lazy_deps.py: add aiohttp==3.14.0 to platform.discord, platform.matrix;
  bump the stale platform.teams pin 3.13.4 -> 3.14.0.
- pyproject.toml: add aiohttp==3.14.0 to the matrix extra; bump the teams extra
  3.13.4 -> 3.14.0 (homeassistant/sms/messaging already at 3.14.0).
- tests/test_packaging_metadata.py: test_security_pins_present_in_mirrored_lazy_features
  now covers platform.discord/slack/matrix/teams. The existing agree-guard only
  compares packages pinned in BOTH sources, so it can't catch a lazy feature
  that omits a pin entirely; this guard is an explicit coverage contract
  (security package -> lazy features that must carry it) and fails with
  'platform.matrix: aiohttp=MISSING' if a floor is dropped again.
- uv.lock: regenerated, zero drift (aiohttp 3.14.0).
2026-07-01 02:51:45 -07:00
Wing Huang
828f33e6b1 fix(ci): map contributor email for attribution check
scripts/release.py AUTHOR_MAP is greped by the Contributor Attribution
Check to resolve a commit author's email -> GitHub username. Add
huangsen365@gmail.com -> huangsen365 so this PR's commits pass the check.

(This commit originally also carried a gateway race-test flake fix; that
edit is now dropped because main independently hardened the same test with
a superior server._sessions snapshot/restore isolation, making ours
redundant.)
2026-07-01 02:51:45 -07:00
Wing Huang
6f956d7405 test(deps): guard pyproject<->lazy_deps pin consistency
Adds two checks to tests/test_packaging_metadata.py:

1. No package is exact-pinned to two different versions across
   pyproject.toml's [project.dependencies] / extras.
2. Every package pinned in BOTH the pyproject extras and the LAZY_DEPS
   allowlist in tools/lazy_deps.py uses the same version.

This is the regression guard for the drift the rest of this PR fixes: the
two pin sources are hand-maintained mirrors (lazy_deps even documents
"update both this map AND the corresponding extra"), and they have silently
diverged on aiohttp and anthropic. Run against the pre-fix tree, check (2)
fails on `anthropic: pyproject=['0.86.0'] lazy_deps=['0.87.0']`.

The lazy_deps side is parsed via AST (not imported) so the test stays free
of tools/lazy_deps.py runtime imports; only exact `==` pins are compared.
2026-07-01 02:51:45 -07:00
Wing Huang
db57cbbaf6 security(deps): bump aiohttp to 3.14.0, anthropic to 0.87.0; pin cryptography floor
- aiohttp 3.13.4 -> 3.14.0 (messaging/slack/homeassistant/sms extras +
  lazy_deps platform.slack) — picks up CVE-2026-34993 (RCE via
  CookieJar.load deserialization) and CVE-2026-47265 (per-request cookie
  leak on cross-origin redirect). Both are fixed only in 3.14.0; there is
  no 3.13.x backport.
- anthropic 0.86.0 -> 0.87.0 (anthropic extra) — CVE-2026-34450 /
  CVE-2026-34452. lazy_deps provider.anthropic was already 0.87.0; the
  extra pin had drifted back to the vulnerable 0.86.0, so this realigns it.
- cryptography pinned explicitly at 46.0.7 in core deps — CVE-2026-39892,
  CVE-2026-34073. It only arrives transitively via PyJWT[crypto]; the
  explicit floor keeps the WeCom/Weixin crypto paths from drifting below
  the fix.

uv.lock regenerated; only aiohttp / anthropic moved (cryptography already
resolved to 46.0.7). Verified 3.14.0 satisfies discord.py 2.7.1
(aiohttp>=3.7.4,<4) and slack-sdk 3.40.1 (aiohttp>=3.7.3,<4).
2026-07-01 02:51:45 -07:00
teknium1
b48cacb97b fix(gateway,cron): guard cron model-tool path + add auto-resume loop breaker (#30719)
Completes the #30719 restart-loop defenses. Defenses 1-2 (the
_HERMES_GATEWAY guard on `hermes gateway stop|restart` + terminal_tool,
and the cron-creation lifecycle filter) already landed on main, but two
gaps remained:

- The agent's `cronjob` model tool calls cron.jobs.create_job directly,
  bypassing the hermes_cli.cron.cron_create CLI filter, so lifecycle
  commands scheduled via the model tool were only blocked at execution
  time (terminal_tool), not at creation. Moved the filter to a shared
  cron/lifecycle_guard.py enforced at create_job — the single chokepoint
  every job-creation path hits (CLI + model tool). Re-exported
  _contains_gateway_lifecycle_command from hermes_cli.cron so
  terminal_tool's import keeps working.
- No breaker for the auto-resume loop itself. Defenses 1-2 cover the
  cron/CLI/terminal paths, but any other SIGTERM source (e.g. a raw
  terminal("launchctl kickstart ai.hermes.gateway")) still triggers the
  boot->auto-resume->re-run cycle. Added gateway/restart_loop_guard.py:
  counts restart-interrupted boots in a rolling window (config
  gateway.restart_loop_guard, default 3 boots / 60s) and skips
  auto-resume for that boot once tripped. The gateway still comes up and
  serves real inbound messages; it just stops replaying the session that
  keeps killing it, putting a human back in the loop.

Also tightened the lifecycle regex over main's version: dropped
`hermes gateway start` (benign), required the gateway identifier on the
launchctl/systemctl branches (so `launchctl unload
ai.hermes.update-checker.plist` and `systemctl restart
hermes-meta.service` no longer false-positive), added the inverse
pkill token order, and fixed the binary-script bypass (decode with
errors='replace' instead of swallowing UnicodeDecodeError). The
create_job guard resolves relative script paths under HERMES_HOME/scripts
the same way the scheduler does, so a bare script name is scanned as the
file that actually runs.

Design and much of defense-2 originate from PR #33395 (@kshitijk4poor),
which itself salvaged #30728 (@SimoKiihamaki). Rebuilt against current
main since defenses 1-2 had already landed under different names.

Closes #30719.

Co-authored-by: SimoKiihamaki <simo.kiihamaki@gmail.com>
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-07-01 02:48:36 -07:00
Ben Barclay
c71f816956 fix(compression): clear all per-session state in on_session_end, not just _previous_summary
The original cross-session contamination fix (#38788) only cleared
_previous_summary in on_session_end(), but on_session_reset() clears
14+ per-session variables. When a session ends (cron exit, gateway
expiry, session-id rotation) and the compressor instance is reused,
the surviving stale state causes:

- _ineffective_compression_count surviving → next session skips
  compression prematurely (anti-thrashing guard misfires)
- _summary_failure_cooldown_until surviving → next session blocks
  summary generation for an unrelated transient error
- _last_compress_aborted surviving → callers think compression is
  still aborted
- _last_aux_model_failure_* surviving → stale error warnings shown
- _last_summary_dropped_count / _last_summary_fallback_used
  surviving → misleading user warnings
- _context_probed / _context_probe_persistable surviving → stale
  context-probe state

Also fix on_session_reset() which was missing _last_compress_aborted
clearing — a /new or /reset would inherit the aborted flag from the
prior conversation.

Add 6 targeted tests covering the leak vectors and a parity test
ensuring on_session_end and on_session_reset always clear the same
surface.
2026-07-01 02:48:32 -07:00
teknium1
51feecc2b1 fix(security): block shell-collapse rm -rf / spellings at the hardline floor
rm -rf //, /., /./, /.. and //* all resolve to / in the shell but slipped
past the root-filesystem hardline pattern, whose target group only matched
the literal / and /* tokens. They fell to the softer DANGEROUS_PATTERNS
'delete in root path' rule, which --yolo / approvals.mode=off / cron
approve-mode are designed to bypass — leaving the one unconditional floor
open to a full root wipe under yolo.

Broaden the root token from '/|/\\*|/ \\*' to '/[/.]*\\**' inside
_hardline_rm_path so any root-anchored path whose components collapse back
to / (repeated slashes plus ./.. segments) with an optional trailing glob
is caught. A trailing real segment (/tmp, /home, /.ssh) still fails to
match and stays with the softer rules.

Co-authored-by: kernel-t1 <214165399+kernel-t1@users.noreply.github.com>
2026-07-01 02:46:38 -07:00
teknium1
d15a288812 chore(release): map arthurzhang author for PR #34718 salvage 2026-07-01 02:45:22 -07:00
Ruzzgar
e13b6ce1c6 test(redact): cover Slack App-Level (xapp-) token redaction 2026-07-01 02:45:22 -07:00
ArthurZhang
fdb9620ac4 security(agent): redact Slack App-Level (xapp-) tokens
The xapp-<num>-<hash> format used by Slack App-Level / Socket Mode
tokens was missing from both agent/redact.py prefix patterns and
gateway/run.py gateway secret patterns, so SLACK_APP_TOKEN values could
leak through to chat users even with security.redact_secrets enabled.

Adds an anchored xapp-\d+- pattern to both redaction paths.
2026-07-01 02:45:22 -07:00
skyzh
cc7d20d683 feat(raft): add gateway setup wizard
Add an interactive Raft setup flow for hermes gateway setup. The wizard follows the existing platform adapter setup pattern, persists RAFT_PROFILE to the Hermes env file, preserves an existing profile when the user declines reconfiguration, and registers the flow via setup_fn.

Add focused Raft adapter coverage for saving RAFT_PROFILE, keeping an existing profile, and registering setup_fn.

Signed-off-by: skyzh <skyzh@mail.build>
Signed-off-by: HaoHao <HaoHao@mail.build>
2026-07-01 02:45:11 -07:00
Teknium
da6d5fcd13
fix(auth): serialize Codex OAuth pool refresh under the auth-store lock (#56233)
The credential-pool Codex refresh path synced tokens from auth.json and
then POSTed the refresh_token to OpenAI's token endpoint without holding
the cross-process auth-store lock across the whole read->POST->write-back
sequence. Because Codex refresh tokens are single-use, two concurrent
Hermes processes could both adopt the same on-disk token and both POST
it; the loser got refresh_token_reused / invalid_grant.

Wrap the Codex OAuth branch of _refresh_entry in the existing shared
_auth_store_lock (reentrant, cross-process flock) using the same
extended-timeout pattern resolve_codex_runtime_credentials() already
uses. A waiting process now blocks on the lock and, once inside, the
in-lock re-sync picks up the rotated token the winner persisted and
skips its own POST. Also send User-Agent: hermes-cli/<version> on the
refresh request.

Credit @cooper-oai (#34820) for identifying the concurrent-refresh
reuse race; this ships the narrow lock-serialization fix without the
separate Codex auth-store partition.
2026-07-01 02:45:07 -07:00
heathley
a8a97c358f fix(matrix): block unsafe image redirects per-hop
Matrix outbound image downloads validated only the final URL after
following redirects, so a public URL that 302-redirects to loopback /
private-network / cloud-metadata endpoints had already connected to the
unsafe hop before the check ran.

Re-validate every redirect hop before following it:
- aiohttp path resolves redirects manually with allow_redirects=False,
  validating each Location via is_safe_url (aiohttp can't use the httpx
  response event hook).
- httpx fallback installs the shared _ssrf_redirect_guard event hook.

Regression tests cover per-hop blocking of an unsafe redirect, following
a safe redirect chain, and httpx guard wiring.
2026-07-01 02:44:57 -07:00
Teknium
868fa9566a fix(security): block /proc/*/auxv and /proc/*/pagemap read leaks
auxv leaks AT_RANDOM (stack canary seed) + AT_BASE/AT_PHDR load
addresses — an ASLR oracle on par with maps. pagemap exposes
virtual->physical translation. Both slipped through the endswith
tuple alongside the maps family covered by the salvaged commit.

Adds regression coverage for auxv/pagemap and for the per-thread
/proc/<pid>/task/<tid>/<file> alias form (endswith catches both).

Follow-up on #32238, closes #34430.
2026-07-01 02:44:53 -07:00
AhmetArif0
64e6b98ba8 fix(security): extend /proc read block to smaps, smaps_rollup, numa_maps, mem
PR #4609 blocked /proc/*/maps to prevent ASLR layout leakage, but the
endswith("/maps") check does not match /proc/*/smaps or
/proc/*/smaps_rollup — both expose the same virtual-address layout and
bypass the guard.  /proc/*/numa_maps carries the same data with NUMA
annotations and is equally bypassed.  /proc/*/mem (raw process memory)
is added as defence-in-depth; it requires address knowledge to exploit
but is blocked for consistency.

Extends the endswith tuple in _is_blocked_device_path() to cover all
four variants and adds regression assertions for all new paths to
test_proc_sensitive_pseudo_files_blocked.

Partially addresses #4427.
2026-07-01 02:44:53 -07:00
Teknium
275e293f54
fix(matrix): decline dead/abandoned invites instead of retrying forever (#56222)
An invite to a room with no remaining members surfaces as "no servers
in the room have been provided" or "room not found" on join. The pending
invite was never cleared, so every gateway startup re-attempted the join
and re-emitted the warning indefinitely.

Detect that specific failure mode by narrow error-message match and call
leave_room to decline the invite; transient/network errors leave the
invite untouched for the next sync. Adds 5 tests.

Reimplements the matrix portion of #33953 onto the current plugin adapter
(gateway/platforms/matrix.py was relocated to
plugins/platforms/matrix/adapter.py since the PR was opened). The two
gateway/status.py fixes from that PR (wrapper-subcommand rejection,
psutil start-time fallback) already landed on main independently.

Reported by @Bougey; original patch authored by @KiraKatana.
2026-07-01 02:44:18 -07:00
sprmn24
88d6e833f1 fix(agent): wrap list-type untrusted content in untrusted_tool_result
_maybe_wrap_untrusted() only wrapped str-typed tool outputs. When a
high-risk tool (web_extract, browser_*) returns a multimodal content
list ([{type:text},{type:image_url}]) — which _tool_result_content_for
_active_model() produces by unwrapping the _multimodal envelope for
vision-capable providers — the text part reached the model completely
unguarded. An attacker page that ships one image bypassed the entire
untrusted-data wrapper.

Extend the wrapper to handle list content: each {type:text} part is run
through the same string-wrapping path (min-char threshold, delimiter
neutralization, one well-formed block), image/video parts pass through
untouched so the list stays valid for vision adapters. Recursing into
the existing string branch means the list path inherits the delimiter
defang and the no-forgeable-fast-path hardening from #56172 for free.

The outer list is rebuilt (not returned by identity), so callers compare
by value.
2026-07-01 02:44:09 -07:00
rrevenanttt
0c0b4b6989 fix(security): collapse $IFS whitespace obfuscation before approval checks
## What does this PR do?

Closes a critical bypass of the dangerous-command approval system. The
normalizer that every command passes through before pattern matching
(`_normalize_command_for_detection`) already strips ANSI, null bytes,
fullwidth Unicode, backslash escapes and empty-quote token splits — but
it did nothing about the shell `IFS` variable. In any POSIX shell `$IFS`
and `${IFS}` expand to whitespace, so a command written as
`rm${IFS}-rf${IFS}/` is executed by the live shell as `rm -rf /` while
the detection regexes — which anchor on literal `\s` between a command and
its arguments — never fire.

The impact is severe: this evades BOTH layers at once. It slips past every
entry in `DANGEROUS_PATTERNS` (so `curl${IFS}...|sh`, `sed${IFS}-i`
against `~/.hermes/config.yaml`, sudo privilege flags, etc. auto-run with
no approval prompt) AND the unconditional hardline floor that is
documented as un-bypassable "not even with --yolo" (`rm -rf /`, `mkfs`,
`dd` to a raw block device, `shutdown`/`reboot`, fork bomb). A
prompt-injected or malicious instruction could wipe the host filesystem or
power the box off while the approval system reports nothing. Confirmed at
runtime before the fix: `detect_hardline_command('rm${IFS}-rf /')` returned
`(False, None)`.

The fix mirrors the shell's own expansion: it collapses `$IFS` / `${IFS}`
(including the bash substring form `${IFS:0:1}`) to a single space inside
the existing de-obfuscation block, so the whitespace-anchored patterns
match exactly as they do for the un-obfuscated command. It is deliberately
narrow and safe — a `\b` word boundary keeps it from touching unrelated
variables like `$IFSACONFIG`, so it cannot introduce false positives on
legitimate commands.

## Related Issue

N/A

## Type of Change

- [x] 🔒 Security fix

## Changes Made

- `tools/approval.py`: in `_normalize_command_for_detection`, substitute
  `$IFS` / `${IFS}` (and `${IFS:...}`) expansions with a literal space
  before dangerous/hardline pattern matching, alongside the existing
  backslash and empty-quote de-obfuscation.
- `tests/tools/test_approval.py`: add `TestIFSWhitespaceBypass` covering
  the brace, bare and substring IFS forms against both
  `detect_hardline_command` and `detect_dangerous_command`, plus
  regression guards that a look-alike variable (`$IFSACONFIG`) and plain
  safe commands are not flagged. Import `detect_hardline_command`.

## How to Test

1. Reproduce the hole (pre-fix): `detect_hardline_command('rm${IFS}-rf /')`
   returns `(False, None)` and `detect_dangerous_command(...)` returns
   `(False, ...)`, i.e. a host-destroying command is auto-approved.
2. With the fix applied, both now flag the command: hardline match
   "recursive delete of root filesystem" and dangerous match "delete in
   root path".
3. Run the suite: `pytest tests/tools/test_approval.py
   tests/tools/test_hardline_blocklist.py -q` — the new
   `TestIFSWhitespaceBypass` cases pass and nothing else regresses.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (no unrelated commits)
- [x] I've run the relevant tests and they pass (two pre-existing failures
      are environmental: missing optional deps in the minimal venv, not
      caused by this change)
- [x] I've added tests for my changes
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — the change is a
      pure string transform with no platform-specific behavior; footgun gate passes
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
2026-07-01 02:44:04 -07:00
mrparker0980
10a54ccc2c fix(security): anchor @file context refs to canonical read deny-list
`@file` / `@folder` context-reference expansion enforced its own narrow
deny-list (`_ensure_reference_path_allowed` in `agent/context_references.py`)
that only covered `~/.ssh` keys, a handful of shell dotfiles, `~/.hermes/.env`,
and `skills/.hub`. It never blocked the credential stores that the canonical
read guard (`agent/file_safety.get_read_block_error`) protects: provider API
keys (`~/.hermes/auth.json`), Anthropic OAuth tokens
(`~/.hermes/.anthropic_oauth.json`), MCP OAuth material (`~/.hermes/mcp-tokens/`),
webhook HMAC secrets, and project-local `.env` files.

This matters because the messaging gateway feeds **untrusted** remote text
straight into reference expansion: `gateway/run.py` calls
`preprocess_context_references_async(..., allowed_root=_msg_cwd)` where
`_msg_cwd` defaults to the operator's HOME when `TERMINAL_CWD` is unset. A chat
peer (Telegram/Discord/Slack/...) could send `@file:~/.hermes/auth.json`, pass
the `allowed_root` check (it resolves under HOME), slip past the narrow list,
and have the operator's live keys read into the agent's context — where the
model would typically echo or act on them.

Rather than duplicate and re-sync a second secret list, this routes the guard
through the existing single source of truth. A reviewer might ask "why not just
add `auth.json` to the local list?" — because the local list has already drifted
once (a prior commit had to add `.config/gh`); anchoring to
`get_read_block_error` means every future addition there protects this path too.
The narrow checks are kept as a fallback since they also cover dirs that guard
does not (`.aws`, `.gnupg`, `.kube`, etc.), and the canonical lookup is wrapped
so it can never crash reference expansion.

N/A

- [x] 🔒 Security fix

- `agent/context_references.py`: `_ensure_reference_path_allowed` now also
  consults `agent.file_safety.get_read_block_error` after its existing checks
  and refuses the reference when that canonical guard flags the resolved path.
  The lookup is wrapped so guard-resolution failures fall back to the explicit
  checks instead of breaking expansion.
- `tests/agent/test_context_references.py`: added
  `test_blocks_canonical_read_denylist_credential_stores`, asserting that
  `@file` attaches for `auth.json`, `.anthropic_oauth.json`, `mcp-tokens/*`, and
  a project-local `.env` are all refused and their secret bodies never reach the
  expanded message.
- `scripts/release.py`: added the contributor email to `AUTHOR_MAP` (release
  gate).

1. `scripts/run_tests.sh tests/agent/test_context_references.py` — all 15 tests
   pass, including the new credential-store case.
2. Regression proof: stash `agent/context_references.py`, run the suite with
   `-- -k canonical`, and confirm the new test fails (secrets leak into the
   message) without the fix; restore and confirm it passes.
3. `ruff check agent/context_references.py tests/agent/test_context_references.py`
   and `python scripts/check-windows-footguns.py agent/context_references.py
   tests/agent/test_context_references.py` both pass.

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.)
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix (plus the AUTHOR_MAP release gate)
- [x] I've run the test suite for the touched area and all tests pass
- [x] I've added tests for my changes (required for bug fixes)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
2026-07-01 02:43:49 -07:00
kshitijk4poor
53b017f03e refactor(gateway): share error-text blob between not_found classifiers
Follow-up to the #55780 dead-target not_found blast-radius fix (merged in
#56225). classify_send_error and is_chat_level_not_found each built their own
lowercased error blob, but divergently: classify_send_error appended the
exception CLASS NAME while is_chat_level_not_found did not. A caller passing
exc= to both could get inconsistent answers on the same failure.

- Extract _error_blob(exc, error_text) as the single source of truth both
  classifiers use (str(exc) when non-empty + class name; no stray leading
  space).
- Align is_chat_level_not_found's signature to (exc, error_text), matching
  classify_send_error, removing the swapped-positional footgun; update the
  sole caller and the three tests to keyword form.
- Add a regression guard asserting _error_blob keeps the class name.

Surfaced by the hermes-pr-review Phase 2c structured review of #56225.
2026-07-01 15:11:38 +05:30
Teknium
01e681aa48
docs: unify /new and /reset rows in gateway slash-commands table (#56235)
The messaging gateway table still listed /new ("Start a new
conversation") and /reset ("Reset conversation history") as two
separate commands with divergent descriptions. /reset is an alias
of /new (see COMMAND_REGISTRY in hermes_cli/commands.py) — same
handler, fresh session ID + history. Collapse them into one row
matching the registry wording and the CLI table already on line 39.

Closes #42829.
2026-07-01 02:39:39 -07:00
kshitijk4poor
8f1d22d7ed chore(release): map r266-tech contributor noreply email for #55780 salvage 2026-07-01 15:01:33 +05:30
r266-tech
46f45104c4 fix(gateway): don't mark an entire chat dead on thread/message-level not_found
#55115 added the dead-target registry so confirmed-dead delivery targets are
short-circuited. Its documented scope (gateway/dead_targets.py) is deliberately
narrow: only *whole-chat* deaths -- the `forbidden` and chat-level `not_found`
(`chat not found`) kinds -- should be recorded; "Thread/topic-level not_found is
NOT recorded here ... a deleted topic does not mean the parent chat is dead."

But the implementation doesn't honor that scope. classify_send_error collapses
chat-level "chat not found" AND thread/message-level not_found ("thread not
found", "topic_deleted", "message_id_invalid", "message to edit/reply not
found") into one "not_found" kind, _DEAD_ERROR_KINDS contains "not_found"
wholesale, and deliver()'s except marks the PARENT chat_id dead. So a single
deleted Telegram topic or edited-away message permanently marks the entire chat
(and every future scheduled / cron / agent delivery to it) dead -- silently. The
adapter self-heal the docstring relies on only covers the non-private-group
thread retry; named-DM-topic and message-level failures propagate to deliver()'s
except and wrongly kill the whole chat.

Add is_chat_level_not_found() (factoring the not_found substrings into chat-level
vs sub-chat-level constants) and gate the delivery dead-path: a "not_found" only
marks the target dead when it is chat-level. classify_send_error's public
contract is unchanged (still returns "not_found" for every shape); only the
mark_dead decision is refined, restoring the registry's documented scope.

Cross-platform: telegram/slack/discord delivery all flow through
classify_send_error -> mark_dead. Adds regression tests through the real
deliver() path plus helper/classifier units.
2026-07-01 15:01:33 +05:30
Teknium
4580c03e7d test(gateway): align salvaged #54947-cluster tests with async cache helper
The three salvaged PRs (#46647, #54583, #55013) were authored against a
tree where _refresh_agent_cache_message_count was sync and _session_db was
the raw SessionDB. On current main the helper is async and awaits the
AsyncSessionDB facade, and _run_agent was split into _run_agent_inner.

- Wrap test _session_db in AsyncSessionDB so the awaited get_session works
- Make refresh-calling tests async + await the helper
- Point the placement-guard test at _run_agent_inner (recursion lives there
  post-mixin-extraction)
- Relocated production call sites now correctly await the async helper
2026-07-01 02:29:24 -07:00
teknium1
116a63d3a0 chore(release): map jcjc81 + Tranquil-Flow in AUTHOR_MAP for #54947 cluster salvage 2026-07-01 02:29:24 -07:00
Tranquil-Flow
e7562c394f fix(gateway): skip cross-process guard on session_id switch under same session_key (#54947)
The cross-process coherence guard (#45966) compares the session's
on-disk message_count against the snapshot stored next to the cached
agent, and rebuilds the agent on a mismatch.  The guard is correct
when the cache snapshot and the live count both refer to the same
DB row.  But the agent cache is keyed by session_key, which can
group multiple conversation threads (different session_ids) under
the same key — and the message_count values belong to DIFFERENT
DB rows.

When the user switches from session A to session B under the same
session_key, the cache hit returns A's cached agent.  The guard then
compares A's snapshot count (A.message_count) against B's live count
(B.message_count) — they are NEVER equal because they track
different conversations — and invalidates the cache.  Every session
switch busts the prompt cache and forces a fresh agent build.  The
post-turn re-baseline (#46237) made it worse: it reads the live
count from the CURRENT session_entry.session_id, so each switch
overwrites the original snapshot with the new session's count,
causing the very next switch BACK to the original session to fire
the guard again.

This is the bug from #54947 (P0, sweeper:risk-session-state,
sweeper:risk-caching).

Fix:
  * Record the snapshot's session_id alongside the message_count in
    the cache tuple: (agent, sig, mc, session_id) — a 4-tuple.  The
    cache build at the AIAgent construction site stores the active
    session_id.
  * The cache-hit guard skips the cross-process count comparison
    when the active session_id differs from the snapshot's
    session_id — the comparison is meaningless across different DB
    rows, so the agent is REUSED without invalidation.  The cross-
    process guard still fires when the session_id matches and the
    live count differs (genuine cross-process write on the SAME
    session).
  * _refresh_agent_cache_message_count checks the snapshot's
    session_id: when it differs from the current session_id, the
    snapshot is intentionally left untouched (overwriting it would
    corrupt the original conversation's baseline and cause the
    switch-back to fire the guard).  The legacy 3-tuple shape (no
    session_id) is still re-baselined as before.
  * Backward-compat:
      - 2-tuple (agent, sig) — unchanged, opts out of the guard.
      - 3-tuple (agent, sig, mc) — unchanged behavior, standard
        cross-process check.
      - pending sentinel — unchanged, untouched by re-baseline.
      - new 4-tuple (agent, sig, mc, session_id) — full session_id-
        aware guard with skip on mismatch.

Tests:
  * tests/gateway/test_session_id_cache_coherence.py — 7 tests
    covering L1-L5 from LAYERS.md:
      - L1 session_id switch must REUSE
      - L2 cache tuple records snapshot's session_id
      - L3 re-baseline skips when session_id differs
      - L4 same-session_id turns still re-baseline (#46237 holds)
      - L5 legacy 2-tuples and pending sentinels untouched
      - legacy 3-tuple (no session_id) still guarded (#45966 holds)
      - 3-tuple transitions to 3-tuple (not 4-tuple) on re-baseline

No regressions in 70 existing tests in test_agent_cache.py or 137
related session tests.  Co-authored with #52197 (deferred cleanup
of evicted agents); both fixes compose cleanly.
2026-07-01 02:29:24 -07:00
Jason
aa4731598c fix(gateway): re-baseline agent cache count after first-turn session_meta
The cross-process cache-coherence guard (#45966) compares a session's
on-disk message_count against a snapshot stored next to the cached agent,
rebuilding the agent on a mismatch so a foreign writer (e.g. the dashboard
backend) can't leave the in-memory transcript stale.

On a fresh gateway conversation the post-turn re-baseline
(_refresh_agent_cache_message_count) ran BEFORE the first-turn `session_meta`
marker row was appended to the transcript. That append goes through
append_to_transcript -> append_message, which increments message_count
unconditionally. So the snapshot was left exactly one short of the live
count, and on turn 2 of every fresh conversation the guard mistook this
process's own session_meta write for a foreign write, evicting and rebuilding
the cached agent — silently busting the per-conversation prompt cache the
cache exists to protect.

Move the re-baseline to after the turn's full transcript persistence block
(including the session_meta append and the compression session_id swap). The
snapshot now matches the live count, so the guard fires only on genuinely
foreign writes. This also makes the call honor its own documented contract of
using the compaction-updated session_id.

Adds a regression test that drives the real _handle_message_with_agent
against a real SessionDB and asserts the invariant: after a fresh first turn,
snapshot == live message_count, so the next turn's guard reuses the cached
agent. Fails before this change, passes after.
2026-07-01 02:29:24 -07:00
Evo
6bc0a7ce80 test(gateway): pin in-band follow-up re-baseline boundary + placement 2026-07-01 02:29:24 -07:00
Evo
b4cacba6ae fix(gateway): re-baseline agent-cache message_count before in-band queued follow-up turn
The cross-process cache-coherence guard (#45966) re-baselines the cached
agent's message_count only on the external-turn boundary (#46237, at
_handle_message_with_agent). The in-band queued (/queue) follow-up recurses
into _run_agent mid-chain with the stale build-time snapshot, so the
follow-up's guard sees the first turn's own writes as a mismatch and rebuilds
the agent -- re-introducing the every-turn rebuild / prompt-cache destruction
#46237 set out to prevent, on the in-band path. Re-baseline before the
recursion, symmetric with the accepted external-path fix.
2026-07-01 02:29:24 -07:00
kshitijk4poor
22a137ed40 fix(agent): prefer late-completing real result over timeout message (review)
Review follow-up on the concurrent-tool deadline salvage. timed_out_indices is
snapshotted from not_done at the deadline; a worker can still finish and write
results[i] in the window before the post-execution result loop reads it. The
loop unconditionally replaced results[i] with a fabricated 'timed out' message
for any snapshotted index, discarding a genuinely-successful (just-late) result.

Gate the timeout message on 'and r is None' so a real result always wins. Add a
regression test that forces the snapshot-vs-result-loop race deterministically
(mutation-checked: reverting the guard fails it). Also document the intentional
detached-worker leak at the executor abandon site.
2026-07-01 14:56:52 +05:30
Gustavo Mendes
c1784e9093 fix(agent): bound concurrent tool execution with a wall-clock deadline
A tool with no internal interrupt check (read_file, web_search, or a wedged
terminal backend) that never returns keeps the concurrent-tool poll loop alive
forever: the loop only breaks when all futures finish or an interrupt is
requested, and the 30s heartbeat resets the gateway idle monitor so idle-kill
never fires. The ThreadPoolExecutor was also used as a context manager, so its
__exit__ joined the hung worker with wait=True.

Add a wall-clock batch deadline (HERMES_CONCURRENT_TOOL_TIMEOUT_S, default 420s
— above the 360s web_extract timeout; 0/negative disables). When it fires:
cancel pending futures, signal an interrupt to the worker threads, abandon the
executor (shutdown wait=False, cancel_futures=True) so hung threads aren't
joined, and return a per-tool 'timed out' result for the unfinished calls while
still surfacing the finished ones. Also fixes the latent futures.index(f)
lookup (ambiguous with duplicate futures) by tracking a future->index map.

Salvaged from #54562.

Co-authored-by: Gustavo Mendes <87918773+gustavosmendes@users.noreply.github.com>
2026-07-01 14:56:52 +05:30
Teknium
913e661a09
fix(cache): stop verification-loop synthetic nudges from persisting (#56194)
verify_on_stop / pre_verify append a synthetic assistant "done" plus a
synthetic user nudge to keep the agent going one more turn before it can
claim completion. Both were flagged (_verification_stop_synthetic on the
nudge only), but the flags were never registered in
_EPHEMERAL_SCAFFOLDING_FLAGS, so the central _is_ephemeral_scaffolding()
filter that guards both persistence sinks (SQLite flush + JSON snapshot)
let them through. The resumed transcript then inherited loop-only
scaffolding, invalidating the prompt-prefix cache on later turns.

- add _verification_stop_synthetic and _pre_verify_synthetic to
  _EPHEMERAL_SCAFFOLDING_FLAGS (the single chokepoint both sinks use)
- flag the blocked attempt assistant message too, not just the nudge, so
  the whole synthetic pair drops together and persistence does not keep a
  premature done with the nudge stripped (assistant to assistant adjacency)

The API-payload leak claimed in the report is already handled: the
chat_completions transport strips every underscore-prefixed message key
before the wire, so the marker never reaches strict providers.

Reported by patppham.
2026-07-01 02:26:06 -07:00
Teknium
522a5e93b2 chore(release): map x9x9x9x9x9x91 for #49247 salvage 2026-07-01 02:18:56 -07:00
Tim Roth
24cb80fd72 test(provider): pin api.anthropic.com host on fallback api_mode
Pins that a custom provider on the native api.anthropic.com host resolves to
anthropic_messages on the try_activate_fallback path. From #49247.
2026-07-01 02:18:56 -07:00
Teknium
18c61bb8cf fix(provider): match api.anthropic.com host on fallback api_mode detection
Widen the salvaged #32243 fix to the try_activate_fallback path: a custom
provider pointed at the native api.anthropic.com host (no /anthropic path
suffix, name != anthropic) fell through to chat_completions -> POST
/v1/chat/completions -> 404. Match the host the same way determine_api_mode()
and _detect_api_mode_for_url() now do. Absorbs #49247.
2026-07-01 02:18:56 -07:00
xxxigm
9efe01c3a0 test(runtime): pin Anthropic OAuth → /v1/messages routing across runtime branches
End-to-end regression coverage for #32243 that asserts every runtime
branch resolving an Anthropic endpoint returns
`api_mode == "anthropic_messages"`:

* `_resolve_explicit_runtime` — the path used when a Hermes
  subcommand passes an explicit `--api-key` / `--base-url`.  Pins
  that a stale persisted `model.api_mode: chat_completions` from a
  prior provider migration cannot override the anthropic pin.
* `_resolve_runtime_from_pool_entry` — the path triggered by
  `hermes auth add anthropic --type oauth` (the exact flow from the
  issue).  Same stale-api_mode regression pinned here.
* `_try_resolve_from_custom_pool` — the user-defined
  `providers:` / `custom_providers:` path that depends on the
  URL detector fix landed in the prior commit.  Asserts both the
  detector fallback fires for `api.anthropic.com` and that an
  explicit `api_mode_override` still wins (so users who DELIBERATELY
  pointed a chat_completions transport at api.anthropic.com for
  OpenAI-compat experiments aren't hijacked).

Co-locates the three contracts so a future refactor of one branch
cannot silently diverge from the others and re-introduce the
"out of extra usage" 400 on fresh OAuth Pro/Max credentials.
2026-07-01 02:18:56 -07:00