hermes-agent

Author	SHA1	Message	Date
kshitijk4poor	e1a1dac848	fix(agent): enforce marker-strip invariant with a single terminal sweep (#57491 ) Follow-up to the per-site strips from the review gate. The two copy-site strips are correct but positional — a copy site added after the assembly loops would re-leak _db_persisted into the child-session flush. Add a single terminal sweep (_strip_persistence_markers) run once on the fully-assembled compressed list so the invariant 'no compacted message leaves compress() carrying a persistence marker' is structural, not dependent on copy-site order. - agent/context_compressor.py: _strip_persistence_markers() called before compress() returns; helper docstring notes the sweep is the authoritative guard - tests/agent/test_context_compressor.py: structural regression — neuter the per-site helper to a leaking copy, assert the terminal sweep still strips - tests/run_agent/test_compression_persistence.py: pin the fixture assumption behind the exact-equality row-count assertion	2026-07-03 12:51:12 +05:30
nankingjing	3e204bd771	fix(agent): strip _db_persisted when assembling rotation compression transcript (#57491 ) Shallow messages[i].copy() during context compression propagated the _db_persisted marker from cached gateway incremental flushes into the post-rotation compressed list. _flush_messages_to_session_db then skipped every row when writing to the new child session, so gateway restarts lost the compacted transcript (severe amnesia). Strip the marker in _fresh_compaction_message_copy() and add regression tests for rotation flush + compressor assembly. Fixes #57491	2026-07-03 12:51:12 +05:30
kshitijk4poor	b795a45b8d	fix(compaction): detect and strip merge-into-tail summaries past the delimiter Follow-up to the END-MARKER reorder: moving the summary prefix after the [PRIOR CONTEXT] wrapper meant _is_context_summary_content (prefix-at-start) no longer recognized a merged-tail summary. That silently broke three consumers — the last-real-user anchor (would pick the merged summary as a real user turn, causing active-task loss), the carry-forward summary find, and the auto-focus skip. _strip_summary_prefix would also carry the wrapper + stale tail content forward as the next summary body. Extract the two delimiter strings into _MERGED_PRIOR_CONTEXT_HEADER / _MERGED_SUMMARY_DELIMITER constants (writer + detector stay in sync), teach _is_context_summary_content and _strip_summary_prefix to look past the delimiter, and add a regression test. Standalone summaries unchanged.	2026-07-01 18:23:01 +05:30
Gromykoss	a1a8a967e1	fix(compaction): place END MARKER last in merge-into-tail summaries When the compression summary is merged into the first tail message (the alternation corner case where a standalone summary role would collide with both head and tail), the old format was SUMMARY + END_MARKER + OLD_TAIL_CONTENT — so the preserved tail content appeared AFTER the end marker and the model could read it as a fresh message to respond to. Reorder so the END MARKER is always last: old tail content is wrapped in [PRIOR CONTEXT ...][END OF PRIOR CONTEXT — COMPACTION SUMMARY BELOW] delimiters, then the summary, then the END MARKER. _append_text_to_content handles both string and multimodal-list content. Salvaged from #56372 by @Gromykoss. Only the END-MARKER reorder half is carried over. The PR's second change (a post-compaction pass that strips user-role messages before the first summary marker on compression_count>=2) was dropped: on 2nd+ compactions the protected head decays to system-only (_effective_protect_first_n -> 0, #11996) so the targeted 'ghost head user' does not occur, and where the strip does fire it deletes legitimate recent tail user turns (data loss) and can leave consecutive assistant messages (role-alternation violation).	2026-07-01 18:23:01 +05:30
kshitijk4poor	6e97f5c3f8	test(compressor): tidy blank-line spacing + assert placeholder never overwrites text Review follow-up on the batch salvage: normalize the inter-class spacing to two blank lines (PEP8) between the three new test classes, and add an explicit assertion in test_sanitizer_strips_orphaned_preserves_text_content that the '(tool call removed)' placeholder does NOT overwrite existing assistant text. No production change.	2026-07-01 14:24:41 +05:30
liuhao1024	8f4d195d5f	fix(compressor): pin summary role to user when only system prompt is protected (#52160 ) After the first compaction protect_first_n decays, so on a later compaction the only protected head message can be the system prompt. Adapters like Anthropic and Bedrock send the system prompt as a separate parameter, so the summary becomes the first message in messages[] — and Anthropic rejects any request whose first message is not role=user (HTTP 400). Pin the summary to role=user when the head is system-only, and stop the collision-flip logic from reverting it back to assistant. Salvaged from #52167. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
srojk34	82ac7e16b8	fix(compression): preserve network/auth abort flags across cooldown re-entry (#29559 ) compress() eagerly reset _last_summary_auth_failure and _last_summary_network_failure at the top of every call. On a second compress() during the failure cooldown, _generate_summary() returns None from the cooldown early-return WITHOUT re-asserting those flags, so the abort guard saw False and fell through to the destructive static-fallback that drops the middle window — the data-loss #29559/#25585 describe. Stop resetting them eagerly; a successful summary already clears both, so letting them persist across calls is safe and keeps the cooldown abort protection intact. Salvaged from #52056. Co-authored-by: srojk34 <286497132+srojk34@users.noreply.github.com>	2026-07-01 14:24:41 +05:30
liuhao1024	32b23bfb08	fix(compressor): strip orphan tool_calls instead of inserting stubs (#51218 ) _sanitize_tool_pairs inserted stub role="tool" results for orphaned tool_calls. The pre-API repair_message_sequence() tracks known call IDs by tc.get("id") while this sanitizer keys on call_id\|\|id; when they disagree (Codex Responses API: id != call_id) the stubs are silently dropped by the repair pass, re-exposing the original orphans. Strip the orphaned tool_calls at the source instead (preserving any text content, adding a placeholder for an otherwise-empty assistant turn) to avoid the mismatch class entirely. Salvaged from #51225. Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-07-01 14:24:41 +05:30
H2KFORGIVEN	fc2fac73bd	fix(compressor): prevent orphan user turn after compaction via turn-pair preservation When the last user message sits exactly at head_end (the first compressible index), _ensure_last_user_message_in_tail's final max(last_user_idx, head_end + 1) clamp returns head_end + 1, pushing the user into the compressed region without its assistant reply. The summariser then records it as a pending ask, and the next session re-executes the already-completed task (lights off twice, file deleted twice, message re-sent). Fix: apply Causal Coupling — a compaction boundary must never split a (user -> assistant [-> tool results]) turn-pair. Add _find_turn_pair_end and, when the clamp would orphan the user, push the cut forward to pair_end so the completed pair is summarised together and marked done. 8 new tests in TestTurnPairPreservation; 133 compressor tests pass.	2026-07-01 00:27:09 -07:00
Vladimir Smirnov	9dc6dc062f	fix(agent): handle string context compression messages	2026-06-30 04:38:43 -07:00
Rod Boev	53ef954841	fix(agent): keep cooldown and lock refresh on one authority (#54465 )	2026-06-30 13:36:29 +05:30
Rod Boev	f2ccb2859f	fix(agent): persist compression backoff across resume (#54465 )	2026-06-30 13:36:29 +05:30
kshitijk4poor	ac822e4d36	fix(compression): abort (preserve context) on transient network summary failure (#29559 , #25585 ) When context compaction's summary generation fails, the compressor's default path (abort_on_summary_failure=False) drops the middle window and inserts a static 'summary unavailable' marker — destroying the compacted turns. #29559 reported the field impact: a Connection error at the compaction moment dropped 124->15 messages (110 lost) for a long browser-automation task; #25585 is the same failure mode (failed summary commits a destructive compaction anyway). compress() already has an EXCEPTION to the historical drop default: auth failures (401/403) ALWAYS abort and preserve the session, because rotating into a placeholder-summary child on a broken credential strands the user. A transient network/connection error is the same situation in reverse: it WILL recover, and retrying then is strictly better than discarding context for a momentary blip. Extend the always-abort carve-out to terminal connection/network failures: - new _last_summary_network_failure flag, set in _generate_summary's terminal failure branch when _is_connection_error(e) (reached only after any main-model fallback is exhausted), reset alongside the auth flag; - compress() aborts when it's set (returns messages unchanged, _last_compress_aborted=True), independent of abort_on_summary_failure; - a network-specific operator warning (distinct from the auth + config-flag messages). Scoped to connection errors only: a generic 500/400 still takes the historical fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green). Tests: network-failure detection + abort-despite-flag-false, both mutation-checked (removing the flag-set fails detection; removing the carve-out fails the abort).	2026-06-24 18:31:51 +05:30
kshitijk4poor	623b21bf24	fix(compress): reserve output tokens in the compaction threshold (#23767 , #43547 ) The compaction trigger compared estimated input against context_length * threshold, but the provider reserves max_tokens of OUTPUT out of the same window. With a large max_tokens (e.g. 65536 on a custom provider) the usable input budget is materially smaller than the raw window, so sessions hit a provider 400 before compaction ever fired. _compute_threshold_tokens now subtracts the output reservation (context_length - max_tokens) before applying the percentage and the small-window 85% guard. max_tokens is stored on the compressor (threaded from agent.max_tokens at construction) and reused across update_model() switches; None = provider default = no reservation (full-window behavior, unchanged). Reimplemented on the current _compute_threshold_tokens surface (the inline threshold calc the original PR targeted was since refactored for the small-window #14690 fix); composes with that 85% guard on the effective budget. Credit: @kyssta-exe (#43651) — original design for the output-token reservation in the compaction threshold. Closes #43547.	2026-06-22 17:26:17 +05:30
kshitijk4poor	b2c84a1626	fix(agent): defer preflight compaction until real usage after a compaction (#23767 , #36718 ) After a compaction, the post-compression path parks last_prompt_tokens=-1 and sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens still holds the stale pre-compression value (above threshold). should_defer_ preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False' short-circuit and let preflight fire a SECOND compaction before the provider reported real post-compaction usage. Add an early-return on the awaiting flag so deferral holds for exactly one turn; update_from_response() clears it. The flag-setting half (#36718) already landed on main via the in-place compaction path (conversation_compression.py); this adds the missing should_defer guard that consumes it. Credit: - @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design - @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement Closes #36718.	2026-06-22 16:33:18 +05:30
Teknium	b6a4638b6d	fix(compressor): treat empty-content summary response as failure, not an empty summary (#50297 ) When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels) returns a well-formed HTTP 200 whose summary content is null or empty/ whitespace-only, _generate_summary coerced it to "" and stored a prefix-only summary — silently replacing the compacted turns with nothing. The model then lost all in-progress context after compression (#11978, #11914). _validate_llm_response already guards None / empty-choices, so those never reach the compressor; the gap was a well-formed response with empty content. Now treat empty content as a summary failure: raise so it routes through the existing main-model fallback then transient cooldown, dropping the turns without a summary rather than wiping context with an empty one. Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider configured' errors take the 600s no-provider cooldown; empty/invalid-response RuntimeErrors from a configured provider now correctly get the main-model fallback instead of being misrouted into the long no-provider cooldown. Reported by @Hung2124; area identified by @annguyenNous in #39590.	2026-06-21 11:27:07 -07:00
teknium1	3509be7124	fix(compression): auto-compression triggers at minimum context length (#14690 ) The compaction threshold is max(context_length * threshold_percent, MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on large models, but degenerates at small windows: a model at exactly 64000 ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE window. should_compress() can then never fire, because the provider rejects the request before usage reaches 100%. Auto-compression silently never triggers for any model whose context_length <= MINIMUM / threshold_percent (e.g. 64K-per-slot local models). Centralize the calc in _compute_threshold_tokens(). When the floor would meet or exceed the context window, trigger at 85% of the window (_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses most of its budget before compacting (compacting at the 50% percentage would waste half the small window), but below 100% so compaction actually fires before the provider rejects the request. This mirrors the existing gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at 64000) is unchanged; both call sites (__init__ and update_model) use the shared helper. Co-authored-by: soynchux <soynchuux@gmail.com> Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com> Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>	2026-06-21 07:53:14 -07:00
kshitijk4poor	1e0b3a2bcc	fix(agent): reset stale token calibration on model switch (#23767 ) ContextCompressor.update_model() recomputed context_length/threshold/budgets but kept the cross-call calibration state (last_real_prompt_tokens, last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens, awaiting_real_usage_after_compression, _ineffective_compression_count) from the PREVIOUS model. Those fields encode 'the provider proved this prompt fit' / 'preflight can be deferred' decisions valid only for the model that produced them. Carried across a switch to a smaller-context model, should_defer_preflight_to_real_usage() used the old model's 'it fit' history to SKIP a preflight compression the new model actually needed — sending an oversized prompt the provider rejects (#23767). update_model() now clears that state; the new model's first response repopulates it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer no longer suppresses and should_compress fires on an over-threshold estimate.	2026-06-21 17:46:58 +05:30
teknium1	14ef6312b5	fix(compression): decay protect_first_n so early turns don't fossilize (#11996 ) protect_first_n keeps the first N non-system messages verbatim through compaction so the original task framing survives. But it was applied on EVERY compression pass: the same early user turns were re-copied into each child session and never summarized away, so across a long, repeatedly- compressed session those old messages became immortal and grew the protected head unboundedly (#11996, P1). Decay it: protect_first_n applies on the FIRST compaction only. Once the session has been compressed at least once (compression_count >= 1, or a handoff summary already exists), the early turns are captured in the summary, so _effective_protect_first_n() returns 0 and only the system prompt stays protected. The decay is read at compress_start computation time, before compression_count/_previous_summary are mutated at the end of compress(), so the first pass still protects correctly. Co-authored-by: truenorth-lj <liliangjya@gmail.com> Co-authored-by: davidvv <david.vv@icloud.com>	2026-06-21 00:06:58 -07:00
teknium1	1f874dfe44	fix(compression): stop fallback summary triplicating the latest user ask When LLM summarization fails, the deterministic fallback summary rendered the latest user ask (active_task = "User asked: '<ask>'") verbatim under THREE headings — Historical Task Snapshot, Historical In-Progress State, and Historical Pending User Asks. Re-presenting an already-handled ask as unresolved in-progress/pending work made the model re-answer it AND treat the resurrected ask as the active turn, burying the genuinely-new post-compaction user message (#49307: answer repetition + new-instruction loss, P1). Keep the latest ask once, under Task Snapshot, as historical context only. The In-Progress and Pending-Asks sections now say 'Unknown / None recoverable from deterministic fallback' (consistent with the Active State / Key Decisions / Resolved Questions sections) and explicitly note the ask is historical, not outstanding. The raw turn text still appears in the verbatim 'Last Dropped Turns' transcript — that's the dropped-turn record, not a re-labeled instruction. Note: the separate role=assistant standalone-summary regurgitation (#33256) is left as-is — that role choice is constrained by strict message alternation (user collides with a user-ending head) and is already mitigated by the summary end-marker; forcing the role would risk the alternation invariant. Co-authored-by: r266-tech <r2668940489@gmail.com> Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>	2026-06-20 23:19:27 -07:00
teknium1	5a53e0f0f4	fix(compression): abort on auth failure instead of rotating into a degraded session When the auxiliary summary call fails with an authentication/permission error (HTTP 401/403), context compression now ABORTS and preserves the session unchanged instead of rotating into a child session with a placeholder summary. Before: a 401 (invalid/blocked key, or a token pointed at the wrong inference host) fell through every transient-error check to 'return None', and because compression.abort_on_summary_failure defaults False, compress() took the static-fallback path and rotated the session anyway (messages N->N). The user landed on a fresh-but-broken session that kept failing the same way — paying for a full-context API call each turn with no useful compression. After: _generate_summary classifies 401/403 as a non-recoverable auth failure (_last_summary_auth_failure) and compress() aborts on it regardless of abort_on_summary_failure. A distinct auxiliary summary_model that 401s still retries once on the main model first (its dedicated creds may be the only broken thing); the abort only sticks when the main model itself auth-fails or the fallback also auth-fails. The existing _last_compress_aborted handling in conversation_compression.py already skips rotation and emits a warning, so no session rotation occurs. Tests: TestAuthFailureAborts — 401/403 flagging, compress() aborts despite flag=False, non-auth failures keep the historical fallback path, and aux-model auth failure recovers on main without aborting.	2026-06-20 11:38:21 -07:00
konsisumer	aec38855b5	fix(agent): preserve recent turns during compression	2026-06-12 16:26:58 -07:00
Tranquil-Flow	749b7219c4	fix(compression): always append END OF CONTEXT SUMMARY marker to standalone summaries regardless of role When the compression summary lands as an assistant-role message (head ends with user), the end marker was not appended. Models may regurgitate the summary text as their own visible output when there's no clear boundary signal (#33256). The end marker was already appended for user-role summaries (#11475, #14521) but the assistant-role path was missed in the original fix. This ensures ALL standalone summary messages carry the boundary marker, preventing summary text from leaking into user-visible chat output.	2026-06-12 15:05:00 -07:00
konsisumer	d5e2fbf244	fix(agent): frame compaction handoff sections as historical context	2026-06-11 13:57:13 -07:00
Teknium	3c8f1dee8d	fix(compression): don't overwrite the -1 post-compression sentinel in preflight seed (#36718 ) compress_context() sets last_prompt_tokens=-1 right after compression to mark "no real API usage yet". The preflight display-seed used `_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1 (truthy), so any positive rough estimate clobbered the sentinel with a schema-inflated count — re-triggering compression on the next turn. Treat any negative value as "no real data yet" and skip the seed. Salvaged from #40246 as the minimal root-cause fix. The original also added an `_awaiting_suppression_count` bounded-window state machine to should_compress() across 3 files; left out here to keep blast radius small — the sentinel guard alone fixes the re-fire. The suppression window can be added separately if the usage=None-stub edge case warrants it. Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>	2026-06-07 01:56:51 -07:00
helix4u	e38b0b55d1	fix(compression): avoid repeat preflight compaction from rough estimates	2026-05-29 19:05:03 -07:00
hinotoi-agent	042c1d6bb0	test: cover fallback dropped-turn handoff	2026-05-28 20:34:40 -07:00
Hinotoi Agent	6dc068ef04	fix: broaden deterministic compression fallback coverage	2026-05-28 20:34:40 -07:00
Hinotoi Agent	e785c0ad70	fix: preserve context when summary generation fails	2026-05-28 20:34:40 -07:00
helix4u	71291d83cd	test: keep tirith checks hermetic	2026-05-23 02:20:14 -07:00
Teknium	9aae59feab	fix(compress): make abort-on-summary-failure opt-in via config flag (#28117 ) PR #28102 made the summary-failure abort path the unconditional default, changing established behavior. Gate it behind config.yaml flag `compression.abort_on_summary_failure` (default False = historical fallback-placeholder behavior). - hermes_cli/config.py: new `compression.abort_on_summary_failure` key, default False, documented inline. - agent/agent_init.py: read the flag from compression config and pass to ContextCompressor. - agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure` (default False). `compress()` failure branch gates the abort on the flag; when False, falls through to the restored legacy fallback path (static "summary unavailable" placeholder + drop middle window). - tests: restore original fallback expectations as default; add new TestAbortOnSummaryFailure class for the opt-in mode. Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort detection, locale `gateway.compress.aborted` key) from PR #28102 stays intact — those paths only fire when `_last_compress_aborted` is True, which now only happens when the flag is enabled.	2026-05-18 10:28:20 -07:00
Teknium	1634397ddb	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 ) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	2026-05-18 10:19:40 -07:00
kshitij	5fba236644	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 ) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	2026-05-17 02:29:41 -07:00
teknium1	4ceab16893	fix(compression): keep default protect_first_n at 3 + align ABC Follow-up on the salvaged feat commit: - Keep the constructor / config / yaml-example default at 3 so existing gateway and CLI users see no behavioural change. PR #13754 (which this builds on) had lowered the default to 2 to chase pre-feature parity in the system-prompt-present case, at the cost of quietly halving the protected head for the gateway path (which strips the system prompt before calling compress()). With the new "system prompt is implicit" semantics, default 3 gives every caller a stable head shape. - agent/context_engine.py: bring the ABC's protect_first_n docstring in line with the new semantics so plugin context engines interpret the config key the same way the built-in compressor does. - tests: adjust the default-value test (3, not 2) and a stale comment; per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is since those tests fix concrete head shapes.	2026-05-13 22:25:16 -07:00
snav	dee71a31e5	feat(compression): make protect_first_n configurable The number of head messages preserved verbatim across context compactions was previously hardcoded to 3 in AIAgent.__init__. Expose it as `compression.protect_first_n` in config, matching the existing `protect_last_n` pattern. Motivation: users who rely on rolling compaction for long-running sessions had the opening user/assistant exchange pinned as head forever, which doesn't always match how they want the session framed after many compactions. Lowering to 1 preserves the system prompt + first non-system message; lowering to 0 preserves only the system prompt and lets the entire first exchange age out naturally through the summary. Semantics: `protect_first_n` counts non-system head messages protected in addition to the system prompt, which is always implicitly protected when present. Same meaning across both code paths: protect_first_n=0 → system prompt only (or nothing if no system message) protect_first_n=2 → system prompt + first 2 non-system messages (default) This unifies the CLI path (which reads messages with the system prompt at position 0) and the gateway path (where the gateway /compress handler strips the system prompt before calling compress() — see gateway/run.py L9150-9154 on the parent fork). Previously these two paths disagreed: CLI path: protect_first_n=1 → protect system prompt only Gateway path: protect_first_n=1 → protect first USER turn forever In practice on long-running gateway sessions the old semantics pinned whatever stale aside happened to be the first user message, reinserting it into every compaction summary indefinitely. Default chosen as 2 (not 3) so that the effective protected head count remains 3 messages in the common case — assuming a system prompt is present, default protection becomes system + 2 non-system = 3 total, matching the pre-feature behaviour where `protect_first_n` was hardcoded to protect 3 messages total. Sessions without a system prompt will see a small behaviour change (2 protected head messages instead of 3), but this is the rare path and the new semantics make the system-prompt-present case the well-defined one. Changes: - agent/context_compressor.py: redefine protect_first_n as the count of non-system head messages protected beyond the implicit system-prompt guarantee; both paths converge. Constructor default updated to 2. - hermes_cli/config.py: add `compression.protect_first_n` default (2), matching the new semantics. `show_config` label tweaked to 'Protect first: N non-system head messages' for clarity. - run_agent.py: read protect_first_n from config; 0 is now valid (system prompt is always implicitly protected). - cli-config.yaml.example: document the new key and rationale. - tests/agent/test_context_compressor.py: cover default, override, the end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour, the no-system-prompt (gateway) path, and the new shared-semantics regression test. Fixes #13751 Tested on Ubuntu 24.04.	2026-05-13 22:25:16 -07:00
Wesley Simplicio	35f773c459	fix(context_compressor): treat streaming premature-close as transient error Problem: When a provider or proxy drops a streaming response mid-flight (httpcore raises RemoteProtocolError: "incomplete chunked read", "peer closed connection", "response ended prematurely", etc.), _generate_summary would not classify it as a transient error. Instead of retrying on the main model, it entered the generic 60-second cooldown, leaving context growing unbounded until the cooldown expired. Issue #18458. Root cause: _is_connection_error in auxiliary_client.py did not match httpcore's streaming premature-close error substrings. context_compressor.py's _generate_summary except block never called _is_connection_error, so those errors fell through to the 60-second generic cooldown rather than triggering the retry-on-main fallback path used for timeouts. Fix: 1. auxiliary_client.py — extend _is_connection_error keyword list with: "incomplete chunked read", "peer closed connection", "response ended prematurely", "unexpected eof", "remoteprotocolerror", "localprotocolerror". Also guard the `from openai import ...` with try/except ImportError so the function works in environments without the openai package. 2. context_compressor.py — import _is_connection_error and call it in _generate_summary's except block as _is_streaming_closed. Include _is_streaming_closed in the fallback-to-main condition (alongside _is_model_not_found, _is_timeout, _is_json_decode) and use the shorter 30s transient cooldown for streaming-closed errors. Tests: 4 new regression tests in TestStreamingClosedFallback: - test_incomplete_chunked_read_falls_back_to_main - test_peer_closed_connection_falls_back_to_main - test_streaming_closed_on_main_uses_short_cooldown (stash-verified) - test_non_streaming_unknown_error_still_uses_long_cooldown Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 17:52:51 -07:00
kshitij	c7e8add120	fix(context): handle JSON decode errors in compression — salvage of #22248 (#22416 ) When an auxiliary LLM provider (or an upstream proxy) returns a non-JSON body with `Content-Type: application/json` — e.g. an HTML 502 page from a misconfigured gateway — the OpenAI SDK's `response.json()` raises a raw `json.JSONDecodeError` (or wraps it in `APIResponseValidationError` whose message contains "expecting value"). Previously this fell through to the unknown-error branch and entered a 60s cooldown without retrying on the main model, dropping the middle conversation turns instead. This change folds JSON-decode detection into the existing fast-path fallback chain: detect by `isinstance(e, JSONDecodeError)` OR substring match for "expecting value", retry once on the main model, and use a shorter 30s cooldown when already on main (the body shape tends to flip back to valid quickly when the upstream proxy recovers). The three duplicated fallback bodies (model-not-found, unknown-error, JSON-decode) are consolidated into a single `_fallback_to_main_for_compression` helper that handles the shared bookkeeping (record aux-model failure for `/usage`-style callers, clear summary_model, clear cooldown). Also adds three unit tests covering: raw `JSONDecodeError` retries on main, substring-match for wrapped exceptions, and the 30s cooldown when already on main. Salvage of #22248 by @0xharryriddle. Closes #22244. Co-authored-by: Harry Riddle <ntconguit@gmail.com>	2026-05-09 01:47:15 -07:00
LeonSGP43	fc88eec926	fix(compressor): soften summary prompt for content filters	2026-05-07 06:42:32 -07:00
wmagev	2eef395e1c	fix(compaction): mark end of context summary in role=user fallback When the head ends with assistant/tool and the tail starts with assistant, the summary is inserted as a standalone role="user" message. The body's verbatim "## Active Task" quote then gets read as fresh user input by weak/local models (#11475, #14521). The merge-into-tail path already appends an explicit end-of-summary marker for this reason. Mirror it on the standalone path so both insertion routes give the model the same "summary above, not new input" signal.	2026-05-05 04:51:29 -07:00
swithek	b7bbc62503	fix(compressor): _prune_old_tool_results boundary direction	2026-05-04 05:05:18 -07:00
Teknium	8b7b074df9	test(context_compressor): regression test for PR #17025 tail-protection off-by-one When len(messages) <= protect_tail_count and a token budget is set, the previous formula min(protect_tail_count, len(result) - 1) under-protected the tail by one, allowing the oldest message to be summarized. The test fails on the buggy formula (pruned == 1) and passes on the fix (pruned == 0, tool content preserved verbatim).	2026-04-30 20:00:01 -07:00
Stephen Schoettler	b29b709a71	fix(agent): sanitize Codex tool-call history summaries	2026-04-30 19:58:46 -07:00
Teknium	6ea5699e3f	fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775 ) A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures. Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields: - _last_aux_model_failure_model: str \| None - _last_aux_model_failure_error: str \| None Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings. Surface at three places: - gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved) - gateway /compress command: ℹ line appended to the reply - CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.	2026-04-27 20:08:23 -07:00
Teknium	94b26f3ec9	fix(compression): retry summary on main model for unknown errors before giving up (#16774 ) The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder. Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back). Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.	2026-04-27 19:25:57 -07:00
iamagenius00	dfdc4276e8	fix(compression): notify gateway users when summary generation fails When auxiliary compression's summary LLM call fails (e.g. model 404, auxiliary model misconfigured), the compressor still drops the selected turns and inserts a static fallback placeholder — the dropped context is unrecoverable. Previously the only signal of this was a WARNING in agent.log. Gateway users (Telegram/Discord/etc.) had no way to know context was lost because the existing _emit_warning path requires a status_callback, and the gateway hygiene path uses a temporary _hyg_agent with quiet_mode=True and no callback wired up. Changes: - ContextCompressor: track _last_summary_fallback_used and _last_summary_dropped_count on each compress() call. Cleared at the start of compress() and on session reset. - gateway/run.py hygiene: after auto-compress, inspect the temp agent's compressor; if fallback was used, send a visible ⚠️ warning to the user via the platform adapter (TG/Discord/etc.) including dropped count and the underlying error. - gateway/run.py /compress: append the same warning to the manual compress reply so users running /compress see the failure too. Acceptance: - Summary success: no user-visible warning (unchanged). - Summary failure on gateway hygiene: user receives a TG/Discord message with dropped count + error + remediation hint. - Summary failure on /compress: warning appended to the command reply. - CLI status_callback / _emit_warning path is untouched. - Test coverage: two new tests verify the tracking fields are set on failure and cleared on subsequent success.	2026-04-27 19:18:13 -07:00
briandevans	943465235e	fix(compressor): guard against bare-string items in multimodal content list raw_content from message["content"] can be a list that contains bare strings, not only dicts. The previous `p.get("text", "")` call raised AttributeError on string items, crashing context compression for any session that had a message with mixed content. Guard with isinstance checks: dict → .get("text"), str → len(p), fallback → len(str(p)). Adds a regression test covering the bare-string case that would have AttributeError'd on the pre-fix code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:48:09 -07:00
briandevans	cfc8befe65	fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens _find_tail_cut_by_tokens called len(content) to estimate message tokens. When content is a list of blocks (multimodal: text + image_url), len() returns block count (e.g. 2) rather than character count, so a message with 500 chars of text was counted as ~10 tokens instead of ~135. This caused the backward walk to exhaust all messages before hitting the budget ceiling; the head_end safeguard then forced cut = n - min_tail, shrinking the protected tail to the bare minimum and preventing effective compression of long multimodal conversations. Fix mirrors the existing pattern in _prune_old_tool_results (line 487): sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content) Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard (confirms the test fails with the bug), plain-string regression guard, and image-only block edge case. Fixes #16087. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 21:48:09 -07:00
vominh1919	5401a0080d	fix: recalculate token budgets on model switch in ContextCompressor update_model() recalculated threshold_tokens but left tail_token_budget and max_summary_tokens at their __init__ values. When switching from a 200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K) instead of the intended ~10%. Adds budget recalculation in update_model() and 2 regression tests.	2026-04-25 15:07:56 +05:30
Yukipukii1	1e8254e599	fix(agent): guard context compressor against structured message content	2026-04-22 14:46:51 -07:00
Honghua Yang	3128d9fcd2	fix(context_compressor): keep tool-call arguments JSON valid when shrinking Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments` blobs by slicing the raw JSON string at byte 200 and appending the literal text `...[truncated]`. That routinely produced payloads like:: {"path": "/foo.md", "content": "# Long markdown ...[truncated] — an unterminated string with no closing brace. Strict providers (observed on MiniMax) reject this as `invalid function arguments json string` with a non-retryable 400. Because the broken call survives in the session history, every subsequent turn re-sends the same malformed payload and gets the same 400, locking the session into a re-send loop until the call falls out of the window. Fix: parse the arguments first, shrink long string leaves inside the parsed structure, and re-serialise. Non-string values (paths, ints, booleans, lists) pass through intact. Arguments that are not valid JSON to begin with (rare, some backends use non-JSON tool args) are returned unchanged rather than replaced with something neither we nor the provider can parse. Observed in the wild: a `write_file` with ~800 chars of markdown `content` triggered this on a real session against MiniMax-M2.7; every turn after compression got rejected until the session was manually reset. Tests: - 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON output, non-JSON pass-through, nested structures, non-string leaves, scalar JSON, and Unicode preservation - 1 end-to-end test through `_prune_old_tool_results` Pass 3 that reproduces the exact failure payload shape from the incident Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 12:40:56 -07:00

1 2

66 commits