hermes-agent

Author	SHA1	Message	Date
Jonny Kovacs	fdab380a1a	fix(cron): run jobs under the profile secret scope Once profile isolation is active (multiple gateway profiles or room->profile multiplexing), get_secret() fails closed outside an installed scope. The cron ticker fires jobs from a thread with no per-turn scope, so run_job() died in resolve_runtime_provider() with UnscopedSecretError (e.g. for OPENROUTER_BASE_URL / CUSTOM_BASE_URL) before model selection - every cron job failed while interactive turns worked fine. Wrap run_job() in set_secret_scope(build_profile_secret_scope(...)) with a finally-reset, mirroring the proven per-turn pattern in gateway/run.py (_profile_runtime_scope). Single-profile installs are unaffected (the scope is just the profile's own .env). tests/cron: 611 passed, 1 pre-existing unrelated failure (TestRoutingIntents::test_all_token_case_insensitive fails identically on unmodified main in a full-suite run and passes in isolation).	2026-07-03 19:32:52 +05:30
Brooklyn Nicholson	5a6720b884	fix(desktop,tui-gateway,zai): stop thinking-off from reverting to medium A Z.ai desktop user reported thinking reverting to medium after one turn, burning ~200% of a week's credits in 4 days despite reasoning_effort: false in config.yaml. Four compounding bugs: - _session_info reported reasoning_effort "" for disabled reasoning, indistinguishable from unset — the desktop adopted it after the first turn, wiping its sticky "thinking off" pick so every later chat reverted to the default effort. - config.set key=reasoning always wrote agent.reasoning_effort to global config.yaml, so every desktop model-menu selection (preset.effort ?? 'medium') clobbered the user's configured value. Now session-scoped like the messaging gateway's /reasoning, landing on create_reasoning_override so lazily-built sessions keep it too. - YAML `reasoning_effort: false`/`off`/`no` (boolean False) was coerced to "" by every loader's `str(x or "")`, silently re-enabling thinking. parse_reasoning_effort now treats False/"false"/"disabled" as {"enabled": False}; loaders (tui gateway, gateway, cli, cron, delegate) pass the raw value through. The desktop config reader also crashed on the boolean (false.trim()), aborting voice/STT settings. - The zai provider profile never sent thinking on the wire, and GLM-4.5+ defaults to thinking ON server-side — so disabling reasoning was a silent no-op on direct Z.ai, the actual token burner. The profile now emits extra_body.thinking {"type": "enabled"\|"disabled"} for thinking-capable GLM models, mirroring the DeepSeek profile. Also: /new (session reset) now carries reasoning_config across the rebuild like model_override; config.get reasoning prefers the session's live value and maps a config False to "none"; Settings shows "Off" instead of a blank select for hand-written false.	2026-07-02 15:23:47 -05:00
Swissly	242c9639a8	fix(cron): prevent multi-target delivery loop crash on per-target failure The standalone thread-pool fallback in _deliver_result() runs inside the `except RuntimeError:` block (taken when asyncio.run() sees a running loop). When future.result() raised there (SMTP ConnectionError, timeout, etc.), the exception was NOT caught by the sibling `except Exception:` — it escaped _deliver_result() and crashed the whole delivery loop, silently skipping every remaining target. Multi-target delivery (e.g. deliver: 'email:a,email:b') is a documented feature, so this broke a promised contract. Wrap the fallback in its own try/except so a per-target failure is logged with exc_info and the loop continues to the next target. Fixes #47163	2026-07-01 03:48:37 -07:00
teknium1	ac3f4aed96	docs(cron): correct stale 'no new seed code' comments for in_channel The in_channel surface DOES add a seed: _seed_cron_channel_session CREATES the flat (platform, chat_id, None) session and mirrors the brief into it, because mirror_to_session only APPENDS to an existing session and the flat channel row is otherwise absent for a chat_postMessage delivery. Correct the scheduler thread-skip comment and the test class docstring, which still described the earlier 'let the existing mirror seed it' design.	2026-07-01 03:16:13 -07:00
Ben	2c84fb42b0	fix(cron/slack): CREATE the flat session for in_channel (mirror only appends) Live testing exposed a real bug: an in_channel continuable cron delivered flat to the channel (✅) but the reply did NOT continue the job — the bot had no brief in context and confabulated the answer. Root cause: mirror_to_session only APPENDS to a session that already exists (_find_session_id → no-op when none matches); it never CREATEs one. A flat (slack, chat_id, None) row is only created when a human posts a top-level message the bot processes — a cron chat_postMessage delivery never goes through the inbound handler, so the row is absent and the brief is silently dropped. The prior impl relied on the bare mirror (F5/OQ-1 concluded "deletion only" — wrong). Fix: _seed_cron_channel_session mirrors _seed_cron_thread_session — get_or_create_session FIRST (chat_type = "dm" if is_dm else "group", thread_id=None), keyed to the ORIGIN USER'S id, then mirror. The channel session key embeds user_id (…:group:<chat>:<user>), so a system:cron id would key the seed away from the reply; the origin user's id makes seed key == inbound reply key. DM key ignores user_id but needs chat_type=dm to match the prefix. Wired into the in_channel branch after delivery; suppresses the generic mirror to avoid double-write. DM validated (per request): the seeded key equals the inbound DM reply key for a 1:1 DM; continuation works there too. Tests: - Rewrote the in_channel tests to use a real _session_store and the origin user_id; assert get_or_create_session is called with the flat, correctly- keyed source. Prove-fail: (a) reverting the create step and (b) seeding with system:cron each turn a targeted test RED; restore → GREEN. - +2 direct _seed_cron_channel_session unit tests asserting the KEY-MATCH invariant (seed key == inbound reply key) via build_session_key, for both channel and DM. - Rewrote tests/manual/cron_inchannel_e2e.py to drive a REAL SessionStore + real mirror_to_session + real _find_session_id + real build_session_key (no session-layer mocks — the old mocked E2E is exactly why the bug shipped). Asserts the brief lands in the transcript and the reply resolves to the same session, for BOTH channel and 1:1 DM. Full relevant sweep: 283 passed.	2026-07-01 03:16:13 -07:00
Ben	4b4349eb9a	feat(cron/slack): flat in-channel continuable cron delivery surface Add a per-platform `cron_continuable_surface` extra key (`thread` default \| `in_channel`) so a continuable cron job can deliver FLAT into a Slack channel — no dedicated thread — and still be replied-to. In `in_channel` mode the scheduler skips the thread-open branch (leaves `thread_id=None`); the shipped origin-mirror then seeds the `(slack, chat_id, None)` shared-channel session — the same bucket `reply_in_thread: false` routes inbound channel replies to — so a plain channel reply continues the job in context. Design: specs/cron-inchannel-continuable (D1–D7, F5). Model B (shared-channel session), NOT anchoring to the delivery `ts` — on Slack replying to a specific message IS threading, so a `ts` anchor would only relocate the thread, never deliver true threadless continuable. - gateway/platforms/base.py: `supports_inchannel_continuable` capability flag (default False → unsupported platforms fail SAFE to `thread`). - plugins/platforms/slack/adapter.py: flag=True; `_cron_continuable_surface()` resolver (coerces to the two-value enum); `_warn_if_inchannel_without_flat_reply` connect-time warning (D5: warn, not hard-require — the misconfig fails safe). - gateway/config.py: shared-key bridge line (top-level OR nested config). - cron/scheduler.py: read the key generically from platform config, gate the `in_channel` branch on the adapter capability flag, skip thread-open. No new seed function (reuses the existing mirror — G6). Pairing (docs): `in_channel` + `reply_in_thread: false` + `require_mention: false` (or a free-response channel). Missing `reply_in_thread: false` fails safe to a threaded continuation. Gateway-side config flag — `/restart` to apply; NO Slack app reinstall. Tests (from inside the worktree, PYTHONPATH=$PWD): - +6 cron scheduler tests (in_channel skips thread-open; seeds flat channel session with thread_id=None; thread-mode regression; fail-safe on unsupported platform; value coercion). Prove-fail: removing the `and not in_channel_surface` guard turns the two load-bearing tests RED; restore → GREEN. - +10 slack resolver/capability/warning tests; +2 config-bridge tests. - tests/manual/cron_inchannel_e2e.py: offline E2E driving BOTH real legs (delivery seed + inbound reply keying) → both converge on (slack, C, None). - No regressions: test_slack.py 216 passed alone; broader sweep green (4 pre-existing cross-file-ordering failures reproduce identically on pristine origin/main). Docs: cron.md + slack.md + zh-Hans mirrors of both.	2026-07-01 03:16:13 -07:00
kshitijk4poor	7f71a48a3a	fix(cron): release TERMINAL_CWD lock even when run_job body raises Rework follow-up on the per-job TERMINAL_CWD readers-writer lock. The lock was acquired BEFORE the try: whose finally: is the only release site, with the env-override statements (os.environ[TERMINAL_CWD] = workdir; logger.info) sitting in the unprotected window between acquire and try. Any exception there — a raising log handler, an os.environ error, a thread interrupt — propagated out of run_job WITHOUT running the finally, leaking the lock. A leaked writer permanently deadlocks the whole scheduler (every future cron job blocks on acquire_*); a leaked reader blocks all writers. - Snapshot _prior_terminal_cwd before the acquire (so the finally can always restore env even if the body raises before the override). - Open the try: immediately after acquire and move the env-override lines inside it, so the existing finally always releases the lock. - Add a mutation-verified regression test: a workdir job whose in-window logger.info raises must still release the writer lock (a subsequent acquire_write must not block).	2026-07-01 15:39:48 +05:30
entropy-0x	abc349bd79	fix(cron): isolate per-job TERMINAL_CWD from concurrent cron jobs A cron job with a per-job `workdir` overrides the process-global `os.environ["TERMINAL_CWD"]` for the entire duration of its agent run and restores it afterwards. The scheduler dispatches workdir jobs on a single-thread sequential pool and workdir-less jobs on a separate parallel pool, and the in-code comments claimed this made the override safe. That only prevents two workdir jobs from overlapping each other. The two pools run concurrently in the same process and share `os.environ`, so while a workdir job has `TERMINAL_CWD` pointed at its project directory, any workdir-less job firing in the same window reads that same global through the terminal, file, and code-exec tools and runs its commands in the wrong directory. The corruption window spans the whole workdir-job run, and a file write or delete can land in another job's tree. This serializes the override with a writer-preferring readers-writer lock. Workdir jobs acquire it as writers (exclusive for their whole run); workdir- less jobs acquire it as readers, so they still run in parallel with each other but never alongside a workdir job's override. The guarantee is based on run overlap rather than tick boundaries, so it also holds when a workdir job spans ticks. ## What does this PR do? Fixes a directory-isolation bug in the cron scheduler: a workdir cron job's process-global `TERMINAL_CWD` override could be observed by a concurrently running workdir-less cron job, causing that job's shell/file/code-exec commands to execute in the wrong directory. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made - `cron/scheduler.py`: add `_ReadWriteLock` (writer-preferring) and the module-global `_terminal_cwd_lock`. - `cron/scheduler.py`: in `run_job`, acquire the lock as a writer for workdir jobs and as a reader for workdir-less jobs, spanning the `TERMINAL_CWD` override and its restore in the `finally` block. - `cron/scheduler.py`: correct the stale comments in `run_job` and `tick` that claimed the sequential pool alone made the override safe. - `tests/cron/test_terminal_cwd_lock.py`: new tests for reader concurrency, writer exclusion, and the no-cross-observation regression. ## How to Test 1. `python -m pytest tests/cron/test_terminal_cwd_lock.py -q` — the regression test `test_reader_never_observes_writer_override` fails without the lock and passes with it. 2. `python -m pytest tests/cron/test_cron_workdir.py tests/cron/test_parallel_pool.py -q` — confirms the existing `TERMINAL_CWD` set/restore and pool behaviour are unchanged. ## Checklist ### Code - [x] I've read the Contributing Guide - [x] My commit messages follow Conventional Commits (`fix(scope):`, etc.) - [x] I searched for existing PRs to make sure this isn't a duplicate - [x] My PR contains only changes related to this fix - [x] I've run the affected `tests/cron/` suites and all tests pass - [x] I've added tests for my changes (required for bug fixes) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (docstrings/comments) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture — N/A - [x] I've considered cross-platform impact (Windows, macOS) — uses stdlib `threading` only - [x] I've updated tool descriptions/schemas if I changed tool behavior — N/A	2026-07-01 15:39:48 +05:30
teknium1	b48cacb97b	fix(gateway,cron): guard cron model-tool path + add auto-resume loop breaker (#30719 ) Completes the #30719 restart-loop defenses. Defenses 1-2 (the _HERMES_GATEWAY guard on `hermes gateway stop\|restart` + terminal_tool, and the cron-creation lifecycle filter) already landed on main, but two gaps remained: - The agent's `cronjob` model tool calls cron.jobs.create_job directly, bypassing the hermes_cli.cron.cron_create CLI filter, so lifecycle commands scheduled via the model tool were only blocked at execution time (terminal_tool), not at creation. Moved the filter to a shared cron/lifecycle_guard.py enforced at create_job — the single chokepoint every job-creation path hits (CLI + model tool). Re-exported _contains_gateway_lifecycle_command from hermes_cli.cron so terminal_tool's import keeps working. - No breaker for the auto-resume loop itself. Defenses 1-2 cover the cron/CLI/terminal paths, but any other SIGTERM source (e.g. a raw terminal("launchctl kickstart ai.hermes.gateway")) still triggers the boot->auto-resume->re-run cycle. Added gateway/restart_loop_guard.py: counts restart-interrupted boots in a rolling window (config gateway.restart_loop_guard, default 3 boots / 60s) and skips auto-resume for that boot once tripped. The gateway still comes up and serves real inbound messages; it just stops replaying the session that keeps killing it, putting a human back in the loop. Also tightened the lifecycle regex over main's version: dropped `hermes gateway start` (benign), required the gateway identifier on the launchctl/systemctl branches (so `launchctl unload ai.hermes.update-checker.plist` and `systemctl restart hermes-meta.service` no longer false-positive), added the inverse pkill token order, and fixed the binary-script bypass (decode with errors='replace' instead of swallowing UnicodeDecodeError). The create_job guard resolves relative script paths under HERMES_HOME/scripts the same way the scheduler does, so a bare script name is scanned as the file that actually runs. Design and much of defense-2 originate from PR #33395 (@kshitijk4poor), which itself salvaged #30728 (@SimoKiihamaki). Rebuilt against current main since defenses 1-2 had already landed under different names. Closes #30719. Co-authored-by: SimoKiihamaki <simo.kiihamaki@gmail.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-07-01 02:48:36 -07:00
claudlos	1b7e781d21	security(cron): fail closed in scheduler backstop when validator errors Addresses egilewski (Codex) CR on PR #52351: the run_job() credential-exfil backstop caught every exception around _validate_cron_base_url() and set err = None, so an unexpected validator/import error let an unvetted stored provider/base_url pair reach resolve_runtime_provider() — the very sink this checkpoint exists to guard. A synthetic validator-exception probe with a legacy custom:legit + off-host base_url job slipped through (validator_exception ALLOW). Now fail closed: if the validator raises and the job carries a base_url override (the exfil precondition), refuse the run. A job with no base_url override can't exfiltrate via this path — the validator would return None — so it still runs, keeping the common no-override jobs from wedging on an unrelated error. Operator fallback providers come from config, not the job, so they are unaffected. Adds two regressions: validator-exception + base_url -> blocked; validator-exception without base_url -> still allowed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-01 14:23:01 +05:30
claudlos	b24708eda0	security(cron): block base_url overrides that exfiltrate provider credentials The model-facing cronjob tool accepts free-form provider + base_url. On fire, the scheduler pairs the named provider's stored credential with the job's base_url, so a prompt-injected job (e.g. provider=anthropic, base_url=https://attacker/v1) sends the real API key to an attacker endpoint. A base_url with no provider inherits the default provider's key for the same effect. Add a fail-closed guard at the tool boundary: a base_url override is allowed only for the custom/BYOK sentinel, a configured custom_providers entry, or when the override host matches the named provider's own endpoint; an override without an explicit provider is rejected. The trust boundary is the caller, so operator-configured base_urls for named providers are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-07-01 14:23:01 +05:30
Teknium	84c724d692	fix(cron): commit one-shot dispatch before side effect to stop crash re-fire loop (#56177 ) A finite one-shot cron job whose side effect kills the tick (gateway suicide, OOM, segfault, hard-timeout) re-fired forever: mark_job_run — which increments repeat.completed and removes the job — runs AFTER the job, so an abrupt tick death never records completion and every supervisor relaunch re-dispatches the job (#38758). Commit the dispatch BEFORE the side effect: - claim_dispatch() increments repeat.completed under the cross-process jobs lock and persists it before run_job(), converting finite one-shots from at-least-once to at-most-times. - Called from run_one_job (the shared body used by BOTH the built-in ticker and the external Chronos fire_due path) before run_job. - mark_job_run skips the increment for pre-claimed one-shots (no double-count) and still removes at the limit. - get_due_jobs drops a stale one-shot already at its dispatch limit so a job claimed-but-not-cleaned-up after a crash stops appearing as due. - No-op for recurring jobs (advance_next_run) and infinite/no-repeat one-shots; a handed-in job dict absent from the store proceeds. Closes #38758	2026-07-01 01:30:36 -07:00
xxxigm	32bc36522e	fix(cron): use shared get_fallback_chain in job runner (#36734 ) Cron's job runner was the last entry point still reading fallback_providers/fallback_model as an either/or, silently dropping the legacy fallback_model when fallback_providers was set. Every other entry point (cli, gateway, oneshot, fallback_cmd, tui_gateway, auxiliary_client) already merges both keys via get_fallback_chain(). This aligns cron with them at both call sites: the auth-fallback resolution loop and the AIAgent(fallback_model=...) argument. Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>	2026-07-01 01:23:20 -07:00
teknium1	836732f54f	fix(cron): null-safe deliver in cron list + re-resolve BSM secrets per run Two live cron bugs, both surfaced by @banditburai in #35616 (whose larger watchdog/supervisor work is already superseded by the CronScheduler provider refactor on main): - #32896: `cron list` crashed on a present-but-null `deliver` field — `job.get("deliver", ["local"])` returns None for an explicit null, which then hit `", ".join(None)`. Coalesce with `or ["local"]` (same pitfall the sibling `repeat` line already guards against). - #33465: cron jobs 401'd on Bitwarden/BSM-backed secrets. The per-run env reload used a bare `load_dotenv(override=True)`, which re-applied only the .env placeholder — startup had already recorded this HERMES_HOME in env_loader._APPLIED_HOMES, so the external-secret re-pull no-oped. Route the reload through load_hermes_dotenv() and call reset_secret_source_cache() first to force the re-pull (Bitwarden's 300s value-cache keeps it off the network; override honours secrets.bitwarden.override_existing, mirroring startup). Tests: null-deliver regression guard in test_cron.py; reset-before-reload ordering guard in test_scheduler.py. Migrated 31 scheduler-reload test seams from patching dotenv.load_dotenv to the new load_hermes_dotenv / reset_secret_source_cache seam.	2026-07-01 01:05:33 -07:00
teknium1	8d3c450126	refactor(gateway): reuse looks_like_telegram_private_chat_id helper The handoff seed path inlined its own int(chat_id) > 0 private-chat check; delivery.py already had the identical heuristic. Promote it to a public name and reuse it from both sites instead of duplicating.	2026-07-01 01:01:36 -07:00
sprmn24	da4f15cddc	fix(cron): log and redact on secrets-redaction failure If redact_sensitive_text() raises or fails to import, stdout/stderr were silently left unredacted and could leak API keys or tokens into cron job delivery messages and logs. Replace bare with a warning log and replace both outputs with '[REDACTED - redaction failed]' to prevent leaks. Root cause: silent exception swallow in _run_job_script() Impact: potential secrets leak in cron job output delivery	2026-06-30 03:34:21 -07:00
Teknium	643b0dc678	fix(cron): raise default pre-run script timeout from 120s to 1h (#55489 ) Cron pre-run scripts were capped at 120s by default, which surprised users running long data-collection scripts on crons (the whole point of crons being to offload long work). Raise _DEFAULT_SCRIPT_TIMEOUT to 3600s (1 hour). This bounds the script only — skill/agent jobs already run on a separate inactivity budget (HERMES_CRON_TIMEOUT, default 600s idle, 0=unlimited), not a wall-clock cap. Scripts dispatch to a persistent thread pool and do not hold the tick lock, so a long script doesn't starve other due jobs. Docs clarified to make the script-vs-agent timeout distinction explicit. env/config overrides (HERMES_CRON_SCRIPT_TIMEOUT, cron.script_timeout_seconds) unchanged and still take precedence.	2026-06-30 01:00:39 -07:00
Teknium	4c2961c511	fix(curator): never archive cron-referenced skills + floor use=0 pruning (#54443 ) The curator's inactivity prune archived any non-pinned agent-created skill whose activity was older than archive_after_days (90d). A skill loaded only by a cron job had its usage bumped solely when the job fired, so paused jobs, infrequent (quarterly/annual) schedules, and far-future one-shots aged their skills out from under them — the next run then failed to load the now-archived skill. - cron/jobs.py: add referenced_skill_names() returning skills used by ANY job (incl. paused/disabled). - curator.apply_automatic_transitions(): skip cron-referenced skills like pinned; add a use=0 grace floor so a never-used skill is not marked stale/archived until it is at least stale_after_days old. - LLM review pass: candidate list marks cron=yes; prompt forbids pruning cron-referenced skills and never-used skills under 30 days. Tested E2E against a real cron job + real usage records and with 4 new unit tests.	2026-06-28 15:10:21 -07:00
Teknium	d3d621f7c3	revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853 ) * Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)" This reverts commit `2ecca1e7d3`. * Revert "fix(windows): stop terminal-window popups from background spawns (#53810)" This reverts commit `5db1430af9`. * Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)" This reverts commit `ef17cd204d`.	2026-06-27 15:59:00 -07:00
brooklyn!	5db1430af9	fix(windows): stop terminal-window popups from background spawns (#53810 ) * fix(windows): stop terminal-window popups from background spawns Native-Windows desktop/gateway users saw cmd/conhost windows flash on gateway restart, image paste, the dashboard Projects tree, voice notes, and ~5 min after closing the app (detached cron). Two root causes: - Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist, agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw subprocess allocate a fresh console when the launching process has none (pythonw desktop backend / detached gateway) - even with output captured. - uv venv pythonw shims re-exec console python.exe, so Python children get a console regardless of how they're launched. Fixes: - Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned console-exe spawn through it. - FreeConsole() catch-all in hermes_bootstrap: any Python child that exclusively owns an auto-allocated console detaches it at startup (GetConsoleProcessList()==1 gate leaves shared interactive consoles untouched). - Replace PowerShell/wmic gateway PID scans with in-process psutil. - Skip schtasks queries on non-interactive desktop restarts. - Prefer native agent-browser .exe over .cmd shims. - Guard test bans raw subprocess spawns of the Windows-only console tools repo-wide so the popup class can't regress. * fix(windows): scope FreeConsole to background entry points; fix merge fallout Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't tell a uv pythonw->python phantom console apart from a user opening the interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) — both report a single attached process with a tty. Running FreeConsole() in the import-time bootstrap therefore risked detaching a legitimately-interactive terminal. - Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console(); remove it from apply_windows_utf8_bootstrap() (import side effect). - Call it only from known background mains: gateway run, dashboard backend (start_server, what the desktop spawns), cron standalone, tui_gateway entry, slash worker. Interactive CLI/TUI never calls it. - Behavior-contract tests: frees only when solo owner, leaves shared console, no-op without console / on POSIX, and asserts it's not an import side effect. Merge fallout from origin/main (#53791): - local.py: 3-way merge left a dangling *_popen_kwargs (NameError crashing every terminal init). _subprocess_compat.popen already hides the window, so drop it. - discord adapter: merge stacked an undefined windows_hide_flags() onto the primitive call; drop the redundant arg. - test_gateway: scan now goes psutil-first (zero spawn); rewrite the case-variant test to drive that production path. test(claw): mock _subprocess_compat.run seam for Windows process scan claw.py's Windows tasklist/powershell scan routes through the hidden-spawn primitive; the tests still patched claw_mod.subprocess, so on win32 the mock was never hit and real spawns returned nothing. Patch the actual seam.	2026-06-27 14:02:24 -07:00
konsisumer	1b6ebb24c0	fix(agent): validate OpenRouter provider sort before request dispatch	2026-06-27 11:43:08 -07:00
Teknium	d73078e7b0	fix(cron): make per-profile cron isolation intentional and tested (#4707 ) (#53570 ) A profile's cron jobs now provably live in AND execute under that profile's HERMES_HOME. A job authored under profile `coder` is stored at `~/.hermes/profiles/coder/cron/jobs.json` and runs with coder's .env, config.yaml, scripts and skills — never the default root's. This was the de-facto behavior on main but only by accident: PR #50112 had re-anchored cron storage at the shared default root, and a later stale-branch squash merge (#52147) silently reverted it back to the profile home. Neither direction was guarded by a test, so it could flip again on the next stale merge. Changes: - cron/jobs.py: document the per-profile storage anchor (get_hermes_home, NOT get_default_hermes_root) and why anchoring at the root leaks config/credentials/skills across profiles — the #4707 security boundary. - cron/scheduler.py, cron/suggestions.py: same intent documented at the dynamic resolution helper and the suggestions store. - tests/cron/test_cron_profile_isolation.py: pin storage, lock-path, and execution-home resolution to the active profile so a re-anchor can't regress. Verified E2E: jobs created under two profiles land in separate per-profile stores with zero cross-profile leakage and no shared-root store; scheduler execution-home follows the active profile. Full cron suite: 576/576.	2026-06-27 03:55:01 -07:00
Versun	c655cdf2c1	feat(dashboard): expose cron job execution fields	2026-06-27 03:20:32 -07:00
kshitij	e4ff494860	fix(cron): add default retention to per-run job output (#52383 ) (#52646 ) * fix(cron): add default retention to per-run job output to bound disk usage (#52383) Per-run cron output (cron/output/<job>/<timestamp>.md) is written once per execution and was never pruned, so a frequently-scheduled job on a long-running deploy accumulates one file per run indefinitely and can fill the volume ('no space left on device'). save_job_output() now keeps the most recent N output files per job and removes older ones. N defaults to 50 and is configurable via cron.output_retention; a non-positive value disables pruning for operators who manage cleanup externally. Salvaged from #52402 by @0xDevNinja. Closes #52383 * fix(config): add cron.output_retention to DEFAULT_CONFIG Follow-up to #52383: the retention config key was functional via get()-with-default but missing from DEFAULT_CONFIG, so the deep-merge wouldn't auto-populate it for new installs. Add it explicitly. --------- Co-authored-by: 0xDevNinja <manmit0x@gmail.com>	2026-06-25 16:00:13 -07:00
teknium1	f284d85efa	fix(cron): restore [SILENT] silence + suppress empty-turn explainer on Telegram Scheduled jobs delivering to Telegram/etc. started posting a literal '⚠️ No reply: the model returned empty content…' message instead of staying silent. Two interacting causes: 1. The turn-completion explainer (#34452) replaces an empty model turn with a user-facing '⚠️ No reply…' string. In a cron context that is not a silence marker, so the scheduler delivered it — a regression from the previously-silent empty turn. run_job now detects the explainer text deterministically (via the same formatter that produced it) for abnormal-empty turn_exit_reasons and strips it to empty, so the existing empty-response suppression + soft-fail guard apply. The explainer is unchanged on CLI/gateway. 2. The cron suppression used a loose 'SILENT_MARKER in ...upper()' substring check. It leaked bracketless near-markers the model emits ('SILENT', 'NO_REPLY', 'NO REPLY' — #51438, #46917) and wrongly swallowed a real report that merely quoted '[SILENT]' mid-sentence. Replaced with _is_cron_silence_response(): suppresses a canonical token as the whole response, its own first/last line, or the documented bracketed '[SILENT] <note>' prefix — while a token buried mid-sentence in a genuine report is delivered. Preserves the intentional cron trailing/prefix tolerance (existing tests unchanged). Tests: bracketless-variant suppression, mid-sentence-quote delivery, direct matcher contract, and explainer-strip + defensive real-report delivery.	2026-06-25 13:45:09 -07:00
Victor Kyriazakos	b177d4ee48	fix(cron): mirror continuable cron as a labelled user turn (alternation-safe) Addresses review on #51077 (kxee). The continuable-cron mirror reused gateway.mirror.mirror_to_session, which writes role=assistant — re- introducing the exact alternation violation #2313 (`37a997945`) deliberately removed: a cron brief landing as assistant after the agent's last turn yields assistant->assistant, which breaks strict- alternation providers (OpenAI/OpenRouter) per issue #2221. The mirror/ mirror_source metadata is also dropped at the SQLite boundary, so the [Delivered from cron] label is lost on replay. This is an intentional, opt-in (default OFF) reversal of #2313's 'cron output does not belong in interactive history' for the reply-to- cron use case — gated behind cron.mirror_delivery / attach_to_session. Fixes: - mirror_to_session gains a role param (default 'assistant' — interactive send_message mirror unchanged, it IS the agent speaking). Cron paths pass role='user' with a '[Cron delivery: <task>]' prefix so the brief collapses via repair_message_sequence's consecutive-user merge on every provider, and stays distinguishable on replay despite the metadata drop. - thread_seeded: defer seeding + the flag until delivery into the new thread actually succeeds. Previously set pre-delivery, so an open- succeeds / deliver-fails case both stranded a seeded-but-unseen brief AND suppressed the DM-fallback mirror. - seed mirror now passes user_id='system:cron' to resolve the exact thread-keyed session row it just created. - dedupe the duplicate BasePlatformAdapter import in _deliver_result. - trim oversized docstrings to non-obvious WHY (AGENTS.md). - docs: document cron.mirror_delivery / attach_to_session in website/docs/user-guide/features/cron.md. - test: assert the cron mirror writes role='user' with the label prefix. 204 cron+mirror tests pass.	2026-06-24 20:27:05 -07:00
Victor Kyriazakos	b693bee100	feat(cron): thread-preferred continuable delivery (open a thread, mirror DM fallback) Continuable cron jobs (attach_to_session / cron.mirror_delivery, default OFF) now prefer a dedicated thread on thread-capable platforms, falling back to origin-DM mirroring where threads don't exist. - Thread-capable (Telegram topics, Discord/Slack threads): open a fresh thread for the job via the shipped adapter.create_handoff_thread, route the brief into it, and seed the thread-keyed session so the user's in-thread reply continues with full context. This is the 'continuable cron opens its own thread' interface. - DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None -> fall back to mirroring into the origin DM session (existing behaviour). Reuses existing infrastructure end-to-end — no new adapter surface, no provider-chain signature change: - adapter.create_handoff_thread (already implemented per-platform, returns None on unsupported platforms = the fallback signal) - the live SessionStore via adapter._session_store (already set on every adapter), reached without threading a new param through the frozen CronScheduler.start() contract - gateway.mirror.mirror_to_session for the seed/append - existing per-target delivery routing carries the new thread_id for free Mirrors GatewayRunner._process_handoff's open-thread-or-fallback + seed pattern, standalone for the cron delivery path. thread_seeded guards against a double-mirror after seeding. Scoped to the origin target only; fan-out/broadcast targets are never threaded or mirrored. Config docs updated (cron.mirror_delivery) + cronjob tool attach_to_session description reframed around continuable/thread-preferred. Tests: +5 (thread id returned on thread platform; None on DM platform; None without capability/loop; seed creates thread session + mirrors; seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests pass (4 failures pre-existing: croniter-not-installed + TZ).	2026-06-24 20:27:05 -07:00
Victor Kyriazakos	98f3c19282	feat(cron): pass origin user_id to delivery mirror (send_message parity) Multi-participant parity with interactive send_message, which passes HERMES_SESSION_USER_ID to gateway.mirror.mirror_to_session so the mirror lands in the exact participant's session. - cronjob_tools._origin_from_env now captures user_id from the session context at job-create time (alongside platform/chat_id/thread_id). - _maybe_mirror_cron_delivery forwards user_id to mirror_to_session. - _deliver_result threads origin.user_id through for the origin target. Effect: in a per-user-isolated group chat (group_sessions_per_user=True, the default), the mirror resolves to the member who scheduled the job instead of conservatively no-op'ing on ambiguous candidates. DMs and shared group/thread sessions are unaffected (single candidate). Default still OFF. Tests: helper forwards user_id; E2E _deliver_result forwards origin user_id. 17/17 in TestCronDeliveryMirror; 527 cron tests pass (4 failures pre-existing: croniter-not-installed + TZ, identical on baseline).	2026-06-24 20:27:05 -07:00
Victor Kyriazakos	c06ceb3232	refactor(cron): scope delivery mirror to the origin conversation The cron->session mirror now fires ONLY for the delivery target that equals the job's origin (platform+chat_id[+thread_id]). A job created from a live gateway chat stamps that chat as origin, and that session is guaranteed to exist (it is the conversation the user scheduled the job in). Fan-out / broadcast / home-channel-fallback targets are never mirrored: they are not a continuation of a conversation and may have no session at all. This makes the prior 'cold-start session seeding' concern a non-case by construction: when the mirror semantically applies the session exists; when none exists the target was never the origin, so we no-op. Adds _target_matches_origin() + origin-scoping tests (exact match, other-chat/other-platform/no-origin rejection, thread scoping, fan-out mirrors only the origin target).	2026-06-24 20:27:05 -07:00
Victor Kyriazakos	1b181724fa	feat(cron): optional mirror of cron delivery into target chat session Adds an opt-in path so a cron job's delivered output is also appended to the TARGET chat's gateway session transcript (as an assistant turn), so a user reply to a recurring delivery (daily brief, reminder) is answered with the delivery in context instead of 'what is that?' amnesia. - Reuses the shipped gateway.mirror.mirror_to_session — the same primitive interactive send_message mirroring already uses. No messaging-toolset change (cron still can't call send_message; this rides delivery). - Gated: per-job attach_to_session overrides global cron.mirror_delivery (config.yaml). Default OFF — historical isolation preserved byte-for-byte. - Mirrors the CLEAN agent output, not the cron header/footer wrapper. - Alternation/cache-safe: append lands at a turn boundary, never mid-loop, never mutates the cached system prompt. Cold-start (no target session) is a silent no-op; mirror errors never fail a successful delivery. - Surfaced on the cronjob tool (attach_to_session) + config schema. Driven by enterprise cron-as-control-plane use case. 10 new tests; full cron + cronjob-tool suites pass (600).	2026-06-24 20:27:05 -07:00
kshitij	5b065e32ed	Merge pull request #51051 from NousResearch/salvage/cron-provider-pin fix(cron): fail closed when an unpinned job provider drifts from creation snapshot (#44585)	2026-06-25 00:05:52 +05:30
uperLu	0d4cecb352	fix(cron): avoid provider package shadowing core cron	2026-06-23 23:39:22 -07:00
Teknium	d93d0aee83	fix(cron): anchor naive schedule timestamps to configured timezone (#51695 ) A naive ISO timestamp (e.g. 2026-06-22T20:07:00) was anchored to the server's local timezone via dt.astimezone(), but the due-check (get_due_jobs -> _hermes_now()) runs in the CONFIGURED Hermes timezone. When the two diverge (cloud host on UTC with a different timezone: set, or vice-versa) the stored instant lands hours off the user's wall-clock intent, so one-shots never become due and recurring jobs fire at the wrong time. The ticker stays healthy (heartbeat + success markers fresh) because every tick finds nothing due, matching the silent no-fire in #51021. Anchor naive timestamps to _hermes_now().tzinfo so '20:07' means 20:07 on the same clock the scheduler checks against. The legacy _ensure_aware path still treats already-stored naive values as server-local for back-compat. Fixes #51021	2026-06-23 23:29:57 -07:00
Teknium	bb7ff7dc30	revert(cron): return cron job storage to per-profile (reverts #32117 + #50993 ) (#51116 ) * Revert "fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993)" This reverts commit `660e36f097`. * Revert "fix(cron): anchor cron storage at the default root home (not the active profile)" This reverts commit `a5c09fd176`.	2026-06-22 17:53:50 -07:00
Teknium	660e36f097	fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993 ) The #32091 fix moved every profile's cron jobs into one shared root store, but never wired the execution-scoping half it recommended: a job still ran under whichever profile's ticker picked it up, not its owning profile. So a job created under `hermes -p donna` could execute with the root profile's .env / config.yaml / credentials. - jobs.py: create_job auto-captures the active profile (explicit profile= override available) and stores it on the job; resolve_profile_home() maps a profile name to its HERMES_HOME; legacy jobs backfill to 'default'. - scheduler.py: run_job applies the job's profile via a scoped HERMES_HOME override (env var + in-process ContextVar) before any .env/config/script load, restored in finally. tick() routes profile-mismatched jobs to the single-worker sequential pool so the env mutation can't race. - cronjob tool threads profile through (NOT exposed in the model schema, to avoid cross-profile privilege escalation); hermes cron add gains --profile. E2E verified against a temp HERMES_HOME with a real profile dir: a root-profile ticker runs a profile='donna' job with HERMES_HOME=donna during execution and restores the ticker env afterward.	2026-06-22 14:54:28 -07:00
kshitijk4poor	a4e61ddf04	fix(cron): fail closed when an unpinned job's provider drifts from creation snapshot (#44585 ) An unpinned cron job follows the global default provider (config.yaml model.default + resolve_runtime_provider). If that global state is changed after the job is created — e.g. a temporary switch to a paid provider like nous/claude-fable-5 — the job silently inherits it on its next tick and spends real money. This is the reported $7.73 incident: a job created under a free/default provider later inherited a temporary paid switch. Fix (ask #1 only) preserves the legitimate "unpinned job should follow model.default" use case by detecting drift rather than freezing the model: - create_job (cron/jobs.py): for UNPINNED, agent-backed jobs (no explicit provider, not no_agent), snapshot the provider that resolution WOULD pick right now into a new optional `provider_snapshot` field, resolved via the same resolve_runtime_provider() path the ticker uses. Fail-open to None on any resolution error so job creation never breaks. - run_job (cron/scheduler.py): right after runtime resolution, if the job has a provider_snapshot AND is unpinned AND the currently-resolved provider DIFFERS from the snapshot, fail closed for that run — make no paid call and deliver a loud, actionable alert naming both providers and telling the user to pin explicitly (`cronjob action=update job_id=.. provider=..`). Back-compat: jobs with no snapshot (pre-existing jobs, no_agent jobs, or any job whose creation-time resolution failed) behave exactly as before — the guard only engages when a snapshot exists. Explicitly-pinned jobs (job.provider set) are unaffected since they don't drift with global state. Tests: tests/cron/test_cron_provider_pin.py covers snapshot-matches (runs), snapshot-differs (fail closed, no agent constructed), no-snapshot back-compat, None-snapshot back-compat, explicitly-pinned (runs regardless), plus create_job snapshot capture/skip/fail-open. The fail-closed case is load-bearing (fails without the guard). Issue #44585 asks #2-4 (hard-stop a running job, gateway-stop containment, fail-closed on provider mutation) are out of scope for this change.	2026-06-23 02:45:52 +05:30
helix4u	ae7e857420	fix(cron): deliver max-iteration fallback reports	2026-06-22 13:57:59 -07:00
sherman-yang	74a5905aea	fix(cron): layer enabled MCP servers onto per-job enabled_toolsets A cron job that sets `enabled_toolsets` to a list of native toolsets (e.g. `["web", "terminal"]`) silently got ZERO MCP tools, while a job with no per-job list got every globally-enabled MCP server. `_resolve_cron_enabled_ toolsets` returned the per-job list verbatim, bypassing the MCP-merge that the platform-fallback branch performs via `_get_platform_tools`. So `discover_mcp_tools()` registered the MCP tools into the registry, but `get_tool_definitions(enabled_toolsets=...)` kept only the named native toolsets — the agent then rejected every `mcp_` call as "Unknown tool". (R2 of #23997.) Fix: `_merge_mcp_into_per_job_toolsets` layers MCP membership onto a per-job allowlist with the SAME semantics as `_get_platform_tools`: `no_mcp` sentinel present -> no MCP servers (sentinel stripped) * one or more MCP server names already listed -> treat as an allowlist * otherwise -> union in every globally-enabled MCP server To avoid duplicating the "which MCP servers are enabled" computation (it already existed inline in `_get_platform_tools`), this extracts a shared `enabled_mcp_server_names(config)` helper in `hermes_cli.tools_config` and has BOTH the gateway/CLI platform resolver and the cron per-job resolver call it — so every path agrees on MCP membership (extend, don't duplicate). Note: the issue's headline — bare MCP server names rejected, registry never includes them — was already fixed on main (commits `c10fea8d2` + `04918345e`, both before the issue was filed). This PR closes the remaining cron-specific gap (R2). The `server:*` / `mcp:server` alias-notation rejection (R1) and the quiet-mode silent-drop (R3) are tracked separately. Salvaged from #32788 by sherman-yang (credited below). Reworked to reuse the shared `enabled_mcp_server_names` helper instead of re-implementing the MCP membership set in cron/scheduler.py. Fixes #23997 Co-authored-by: sherman-yang <58446328+sherman-yang@users.noreply.github.com>	2026-06-22 15:52:58 +05:30
kshitij	b9f302441f	Merge pull request #50112 from NousResearch/salvage/f5-cron-storage-root fix(cron): anchor cron storage at the default root home (#32091)	2026-06-22 15:51:59 +05:30
mohamedorigami-jpg	a5c09fd176	fix(cron): anchor cron storage at the default root home (not the active profile) `cron/jobs.py` resolved `HERMES_DIR`/`JOBS_FILE` from `get_hermes_home()`, which follows the active profile override. So a job created from a profile-scoped agent session (`hermes -p myprofile chat`, where the in-process `cronjob` tool calls `create_job`) was written to `~/.hermes/profiles/myprofile/cron/jobs.json`, while the profile-less gateway (`hermes gateway run`) reads only `~/.hermes/cron/jobs.json`. The job was silently orphaned: `cronjob action=list` from the same profile reported it healthy (same file), but the gateway ticker never saw it and it never fired. `last_run_at` stayed null forever. (#32091) Fix: resolve the cron store from `get_default_hermes_root()` — the purpose-built "profile-level operations" root that returns `<root>` even when `HERMES_HOME` is `<root>/profiles/<name>` (and handles Docker/custom layouts). Now the creator, the gateway scheduler, and the dashboard all agree on a single jobs.json at the root, so a job created under any profile is visible to the gateway. Scope: this is the storage-location half of the fix. Making a job execute under its originating profile's config/skills (a per-job `profile` field + runtime context scoping, the #48649 sibling) is a separate, riskier change and will follow as its own PR — keeping this layer minimal and safe. Salvaged from #32117 by @mohamedorigami-jpg (authorship preserved). The comprehensive #33839 (@sweetcornna) takes the same Option-A storage approach and additionally adds the per-job profile execution scoping; this PR lands the safe storage layer first. Tests: `tests/cron/test_cron_profile_storage.py` — asserts the store anchors at `<root>/cron` under a profile HERMES_HOME (not `<profile>/cron`), and is unchanged when no profile is active. Full `tests/cron/` suite: 511 passed. Fixes #32091 Co-authored-by: mohamedorigami-jpg <mohamed.origami@gmail.com>	2026-06-21 16:45:14 +05:30
liuhao1024	6777a6bd67	fix(cron): run missed-grace jobs once instead of deferring forever When a recurring job's execution time exceeds `interval + grace`, the scheduler entered a perpetual "missed → fast-forward → skip" loop and the job effectively never ran again. A real job (`hermes-upstream-contribution`) logged 42 consecutive "missed" events over 9 hours without executing once. Timeline (5-min interval, 150s grace, ~15-min execution): 14:00 due → advance next_run_at→14:05 → run (blocks 15 min) 14:15 finishes 14:16 tick: next_run_at=14:05, elapsed 660s > grace 150s → "missed!" → fast-forward to 14:21 → continue (SKIP) → does NOT run ... repeats forever for any job whose runtime > interval+grace. The `continue` (skip execution) in `_get_due_jobs_locked` was designed to prevent burst-catchup after gateway downtime — don't run 6 missed instances of a 30-min job on restart. But it wrongly applied to a job that missed its slot because it was still running, not because the gateway was down. Fix: keep the fast-forward (so accumulated missed slots are still collapsed to a single next slot — no burst) but fall through to `due.append(job)` so the job runs ONCE now. The log message is updated to be honest about the new behavior ("Running now; next run fast-forwarded to: ..."). Behavior note: a recurring job missed during gateway downtime now also fires once immediately on restart (rather than waiting for its next natural slot). This is the intended trade-off — the same "run once, don't burst" rule now applies uniformly to both downtime-misses and long-execution-misses. Salvaged from #33318 by @liuhao1024 (authorship preserved). Also addresses the diagnosis in #33361 (@agent-trivi), which proposed the same one-line fix. Tests: updates `test_stale_past_due_skipped` → `test_stale_past_due_runs_once_and_fast_forwards` (the old test encoded the skip behavior); adds `test_long_execution_does_not_perpetually_defer` as a direct regression for the production loop; updates the F2e timezone test that relied on the old skip path. Full tests/cron/ suite: 510 passed. Fixes #33315 Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-21 14:11:12 +05:30
kshitij	f57ff7aef1	Merge pull request #50034 from NousResearch/salvage/cron-tz-offset-repair fix(cron): repair migrated timezone offsets to prevent double-fire	2026-06-21 13:53:28 +05:30
kshitijk4poor	4cc28aa3bb	fix(cron): route Telegram DM-topic cron delivery through DeliveryRouter (#22773 ) PR #22410 added three-mode Telegram topic routing to the live message path (TelegramAdapter.send via the gateway DeliveryRouter), but the cron delivery path never got it. cron/scheduler.py::_deliver_result sent through the live adapter with a bare ``{"thread_id": ...}`` and fell back to the standalone _send_telegram, neither of which addresses Bot API Direct Messages topics correctly. After Bot API 10.0 (2026-05-08), sending to a private chat with a bare ``message_thread_id`` is rejected/mis-routed, so cron deliveries to a private DM topic landed in the General topic instead of the requested lane. Fix: the cron live-adapter branch now routes the text send through the gateway's ``DeliveryRouter._deliver_to_platform`` — the same canonical path live messages use — so it inherits all three Telegram routing modes: 1. Forum/supergroup (negative chat_id) -> message_thread_id 2. Bot API DM topics (private chat_id + numeric topic id) -> direct_messages_topic_id (the case #22773 reported) 3. Hermes-created named private DM-topic lanes -> ensure_dm_topic + reply anchor For mode 2, a private-chat target with a numeric topic id is passed as ``direct_messages_topic_id`` metadata (verified end-to-end: TelegramAdapter._thread_kwargs_for_send turns it into ``{message_thread_id: None, direct_messages_topic_id: <int>}``), instead of a bare message_thread_id. Forum/supergroup and home-channel deliveries are unchanged. The standalone fallback (gateway down) is preserved. No new config knob and no duplicated routing logic — this reuses the existing DeliveryRouter rather than reimplementing topic routing in the cron path. Salvaged from #42051 (stepanov1975) and #23249 (devsart95), which both diagnosed the missing three-mode routing in the cron/standalone path; reimplemented onto the canonical DeliveryRouter that landed since those PRs were opened. Co-authored-by: Alex <9785479+stepanov1975@users.noreply.github.com> Co-authored-by: devsart95 <devsart95@gmail.com>	2026-06-21 13:35:45 +05:30
Tranquil-Flow	f1f36b3bae	fix(cron): repair migrated cron timezone offsets to prevent double-fire A recurring cron job persists `next_run_at` as an absolute timestamp with a UTC offset (e.g. `2026-05-19T21:00:00+10:00`). Cron expressions, however, describe local wall-clock intent ("run at 21:00"). When Hermes/system timezone changes after the timestamp was persisted, the stored instant is re-interpreted in the new zone: `21:00+10:00` is the instant `13:00+02:00`, which is `<= now` (13:02+02:00) — so the job fires HOURS EARLY, then `compute_next_run` advances it via croniter to `21:00+02:00` the same day, producing a SECOND fire. (#28934, recurrence of #24289.) `_get_due_jobs_locked` now detects this precise migration case before the due check: for a `cron` job whose converted instant looks due, whose stored UTC offset differs from the current zone's, AND whose stored wall-clock time is still in the future (distinguishing a migrated offset from a genuinely missed run), it recomputes `next_run_at` from the schedule and skips the early fire — preserving the local wall-clock intent. Verified against the issue's reproducer: stored `21:00+10` under runtime `+02:00` at wall-clock `13:02` is rescheduled to `21:00+02` instead of firing early + again. Salvaged from #28941 by @Tranquil-Flow (authorship preserved). Chosen over the alternative approaches (#28951 normalize-to-UTC, #28985 rebase-and-match) because UTC-normalization does not change the absolute-instant comparison and so does not fix the early fire, and this guard is the tightest: it only acts when all four conditions hold and reuses the existing `compute_next_run`. Fixes #28934	2026-06-21 13:31:31 +05:30
kshitij	02a3288de3	Merge pull request #50018 from NousResearch/salvage/f3a-delivery-confirm fix(cron): make live-adapter delivery confirmation reliable (#38922, #47056, #43014)	2026-06-21 13:29:45 +05:30
annguyenNous	07424da76f	fix(cron): keep ticker alive on BaseException + heartbeat-aware status The in-process cron ticker (cron/scheduler_provider.py) caught only `Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt` raised from a misbehaving provider SDK or agent retry path killed the ticker thread silently. The gateway PROCESS stayed up, so `hermes cron status` — which only checks `find_gateway_pids()` — kept reporting "✓ jobs will fire automatically" while no jobs ever fired (#32612, #32895). This makes ticker death survivable and detectable: - The ticker loop now catches `BaseException` and logs at ERROR with a traceback, so a single bad tick no longer tears the thread down and the failure is visible in the gateway log. - The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds) on startup and after every tick — best-effort, never raised into the loop. Both ticker entry points (the gateway and the desktop fallback in web_server.py) funnel through `InProcessCronScheduler.start`, so one heartbeat site covers both. - `hermes cron status` now reads the heartbeat age: if the gateway is running but the heartbeat is stale (> 200s, i.e. several missed ~60s ticks), it reports the ticker as STALLED and suggests a restart instead of falsely claiming jobs will fire. A missing heartbeat (older build / never ran) is treated as "unknown", not "dead". Adds tests for BaseException survival, per-iteration heartbeat recording, heartbeat round-trip/age, staleness detection, and silent-write-failure. Salvaged from #49660 (BaseException survival on current structure), extended with the heartbeat + honest-status reporting that the earlier (pre-refactor) watchdog PRs #35616 and #33849 proposed. Fixes #32612 Fixes #32895 Co-authored-by: banditburai <promptsiren@gmail.com> Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>	2026-06-21 13:00:50 +05:30
Luke The Dev	d54890870f	fix(cron): make live-adapter delivery confirmation reliable (#38922 , #47056 , #43014 ) Consolidates three cron-delivery defects in cron/scheduler.py::_deliver_result that all stem from how the live-adapter send result is interpreted. #38922 — duplicate message on confirmation timeout. future.result(timeout=60) raising TimeoutError bubbled to the outer except handler, which left delivered=False, so `if not delivered:` re-sent the identical message via the standalone path. future.cancel() cannot un-send a request already in flight on the wire, so a slow confirmation deterministically produced a duplicate. The send was already dispatched onto the gateway loop, so a bare timeout is now treated as delivered (assume-delivered is safer than guaranteed-duplicate) and the standalone fallback is skipped. The live-adapter media attempt is also skipped on timeout since the contended loop would re-block each 30s media budget. #47056 — silent drop when the gateway has an active session. The old check `if send_result is None or not getattr(send_result, "success", True)` let a result object missing a `success` attribute default to True = counted as a successful delivery, so the scheduler logged "delivered via live adapter" while the gateway never processed the message. Delivery is now confirmed via _confirm_adapter_delivery(): only an explicit, truthy `success` attribute counts; None or a `success`-less object falls through to the standalone path so the message actually arrives. A genuine send Exception (not a slow confirmation) still falls through to the standalone path, and is caught by run_job's outer handler — it is recorded as the job's last_error and never crashes the cron ticker. #43014 — deliver=origin fails to resolve in CLI sessions. A CLI-created job has no {platform, chat_id} origin, so deliver=origin (and auto-detect / deliver=None) was unresolvable and emitted "no delivery target resolved" on every run. An unresolvable origin with no configured home channel is now treated as local (output stays in last_output), matching the documented auto-deliver contract; a concrete unresolvable platform target still reports a real error. Salvaged from #41007 (timeout discriminator), folding in #47127's _confirm_adapter_delivery hardening and #38937 / #43063's origin→local fallback. Tests rewritten as behavior contracts (timeout => no duplicate; None / success-less result => standalone fallback; confirmed success => no fallback; CLI origin => local, explicit platform => still errors). Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-21 12:59:21 +05:30
konsisumer	73b92264ee	fix(cron): resolve model.default + fail fast on missing model Cron jobs created without an explicit `model` are stored as `model: null`. At fire time `run_job` resolved `model = job.get("model") or os.getenv( "HERMES_MODEL") or ""` and then `_model_cfg.get("default", model)`, so when config.yaml had no `model.default` (or `model: {default: null}`) an empty string flowed straight to the provider and surfaced as an opaque HTTP 400 ("Model parameter is required" / "model: String should have at least 1 character"). The operator had to inspect jobs.json to discover the job was stored with a null model. This change makes cron model resolution robust and symmetric with the CLI: - Coerce `model: null`/missing config to `{}` so a falsy default never overwrites an already-resolved env value with `None`. - Only overwrite `model` from `model.default` when the resolved value is truthy; accept a `model.model` alias key, mirroring the sibling resolvers in hermes_cli/oneshot.py, fallback_cmd.py and prompt_size.py. - Resolve AFTER the managed-scope overlay so an administrator-pinned model still wins. - Fail fast with an actionable error (caught by run_job's outer handler and recorded as the job's last_error — the cron ticker is unaffected) instead of letting an empty model reach the API. - The per-job model is re-read every tick, so a `cronjob action=update model=...` after a failed run takes effect on the next tick (no cache). Adds tests/cron/conftest.py pinning a default HERMES_MODEL so existing run_job tests don't trip the new guard, plus regression tests covering env fallback, config.default fallback, string-form config, the model alias key, null-default-no-clobber, corrupt-config graceful degradation, fail-fast, and the no-cache re-read property. Salvaged from #24005, rebased onto current main, with additional test coverage folded in from #45550 and the alias-key behavior from #43952. Fixes #43899 Fixes #23979 Fixes #22761 Co-authored-by: szzhoujiarui-sketch <szzhoujiarui@gmail.com> Co-authored-by: rayjun <rayjun0412@gmail.com>	2026-06-21 12:37:56 +05:30
teknium1	c1a0b6a5f1	style: strip trailing whitespace in cron scheduler live-adapter block Follow-up on salvaged PR #49280.	2026-06-19 16:59:38 -07:00
joaomarcos	3a6c171e9e	fix(gateway): log signal transport response and bubble cron live adapter errors	2026-06-19 16:59:38 -07:00

1 2 3 4 5 ...

288 commits