fix(cron): raise default pre-run script timeout from 120s to 1h (#55489)

Cron pre-run scripts were capped at 120s by default, which surprised
users running long data-collection scripts on crons (the whole point of
crons being to offload long work). Raise _DEFAULT_SCRIPT_TIMEOUT to 3600s
(1 hour).

This bounds the script only — skill/agent jobs already run on a separate
inactivity budget (HERMES_CRON_TIMEOUT, default 600s idle, 0=unlimited),
not a wall-clock cap. Scripts dispatch to a persistent thread pool and do
not hold the tick lock, so a long script doesn't starve other due jobs.

Docs clarified to make the script-vs-agent timeout distinction explicit.

env/config overrides (HERMES_CRON_SCRIPT_TIMEOUT,
cron.script_timeout_seconds) unchanged and still take precedence.
This commit is contained in:
Teknium 2026-06-30 01:00:39 -07:00 committed by GitHub
parent 3a83b6bc5d
commit 643b0dc678
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 9 additions and 7 deletions

View file

@ -206,12 +206,14 @@ import requests, json
# Print summary to stdout — agent analyzes and reports
```
The script timeout defaults to 120 seconds. `_get_script_timeout()` resolves the limit through a three-layer chain:
The script timeout defaults to 3600 seconds (1 hour). `_get_script_timeout()` resolves the limit through a three-layer chain:
1. **Module-level override**`_SCRIPT_TIMEOUT` (for tests/monkeypatching). Only used when it differs from the default.
2. **Environment variable**`HERMES_CRON_SCRIPT_TIMEOUT`
3. **Config**`cron.script_timeout_seconds` in `config.yaml` (read via `load_config()`)
4. **Default** — 120 seconds
4. **Default** — 3600 seconds (1 hour)
This timeout bounds the **pre-run script only**, not the agent. Skill-based / LLM-driven jobs run on a separate *inactivity*-based budget (`HERMES_CRON_TIMEOUT`, default 600s of idle time, `0` = unlimited) — they can run for hours as long as they keep calling tools or streaming tokens, and are only killed after the configured idle period with no activity. Scripts are dispatched to a persistent thread pool (not held under the tick lock), so a long-running script does not block other due jobs from firing.
### Provider Recovery