fix(auxiliary): retry transient blips harder + isolate client cache per model (#56889)

Two related hardening fixes for auxiliary calls (which include MoA reference
advisors — a pinned-model path where provider fallback is not a meaningful
recovery):

1. Transient-transport retries: the same-provider retry on a connection reset /
   timeout / 5xx / 408 was a single attempt, then fallback. For a pinned aux
   call a second blip silently loses the call (root of the run2 double-advisor
   'Connection error' collapse — a genuine upstream blip). Now retries N times
   with exponential backoff, N = auxiliary.transient_retries (default 2 -> 3
   total attempts, clamped [0,6]). Compression-on-timeout fast-fail carve-out
   preserved.

2. Per-model client-cache isolation: _client_cache_key excluded the model, so
   two concurrent auxiliary calls to the same provider/base_url/key but
   different models (e.g. an opus + gpt-5.5 MoA fan-out) shared one cache entry
   and could race each other's client lifecycle. Model now participates in the
   key -> distinct clients, no cross-call races. Same-model reuse unchanged.

- agent/auxiliary_client.py: _transient_retry_count() + backoff loop; model in
  _client_cache_key and both call sites.
- hermes_cli/config.py: auxiliary.transient_retries default (2).
- tests: new retry/isolation tests; updated 2 stale-expectation tests to the
  corrected behavior (per-model resolve; N-retry escalation).

Backoff base is overridable (_TRANSIENT_RETRY_BACKOFF_BASE) so tests don't sleep.
This commit is contained in:
Teknium 2026-07-02 01:09:37 -07:00 committed by GitHub
parent 71c0622122
commit fb403a3a73
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 178 additions and 20 deletions

View file

@ -1465,6 +1465,13 @@ DEFAULT_CONFIG = {
# Each aux task is independent — main-agent provider_routing and
# openrouter.min_coding_score do NOT propagate to aux calls by design.
"auxiliary": {
# Same-provider retries for a transient transport blip (connection
# reset / timeout / 5xx / 408) on ANY auxiliary call before falling
# back. Default 2 (→ 3 total attempts), clamped [0,6]. Matters most for
# pinned calls like MoA reference advisors, where provider fallback is
# not a meaningful recovery, so an unretried blip silently loses the
# call.
"transient_retries": 2,
"vision": {
"provider": "auto", # auto | openrouter | nous | codex | custom
"model": "", # e.g. "google/gemini-2.5-flash", "gpt-4o"