hermes-agent/agent
Jeff Watts a2d6f05d1b fix(moa): append reference block at end of aggregator prompt for KV-cache reuse
The MoA aggregator received the per-turn reference block merged into the most
recent `user` message. In an agentic tool loop that message is the original
task near the top of the context (everything after it is assistant/tool turns),
so injecting text that changes every iteration diverges the prompt prefix early.
The server's KV cache then cannot be reused and the entire conversation
re-prefills on every tool-loop step — full prefill each step, which dominates
latency on long contexts.

Append the reference block at the end of the prompt instead (merging into the
last message only when it is already a trailing user turn, i.e. plain chat).
This keeps the [system][task][tool-history] prefix stable and cache-reusable so
only the new block re-prefills, and gives the aggregator the references with
recency. Extracted as `_attach_reference_guidance` with unit tests.

Measured on a local llama.cpp aggregator over a long agentic task: KV-cache
reuse on follow-up steps went from ~0.3% to ~93-95% and per-step prefill on an
~80k-token context dropped from ~44s to <1s, with no change to output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 01:59:00 -07:00
..
lsp feat(lsp): add PowerShellEditorServices language server (#55930) 2026-06-30 16:22:18 -07:00
pet fix(pet): snap kitty frames to whole cells 2026-06-30 15:41:44 -05:00
secret_sources fix: prevent TUI gateway stdin EOF crash across all TUI-context subprocess calls 2026-06-08 22:46:57 -07:00
transports fix(agent): guard against non-dict model_extra in tool call normalization 2026-06-30 03:27:12 -07:00
__init__.py fix(agent): preload jiter native parser 2026-05-28 00:20:11 -07:00
account_usage.py feat(billing): /credits command — balance + portal top-up handoff (#44776) 2026-06-12 08:51:10 +00:00
agent_init.py revert: back out prompt_caching.enabled toggle (#56105) for re-evaluation (#56126) 2026-07-01 00:20:32 -07:00
agent_runtime_helpers.py revert: back out prompt_caching.enabled toggle (#56105) for re-evaluation (#56126) 2026-07-01 00:20:32 -07:00
anthropic_adapter.py fix(anthropic): stop SDK auto-retry double-firing and raise Retry-After cap to 600s 2026-06-27 19:23:15 -07:00
async_utils.py
auxiliary_client.py fix(runtime): honor NOUS_INFERENCE_BASE_URL across pool/explicit/aux paths 2026-07-01 01:52:06 -07:00
azure_identity_adapter.py feat(azure-foundry): add Microsoft Entra ID auth 2026-05-18 10:14:38 -07:00
background_review.py fix(bg-review): scope stdout/stderr silencing to the worker thread (#55966) 2026-06-30 17:28:33 -07:00
bedrock_adapter.py fix(bedrock): check boto3 version >= 1.34.59 before using converse_stream 2026-06-15 05:25:17 -07:00
billing_view.py feat(billing): /billing terminal billing — interactive TUI + CLI client (#45449) 2026-06-19 01:53:32 +05:30
browser_provider.py fix(browser): self-review pass — dead-import, log levels, future-proofing 2026-05-17 04:04:15 -07:00
browser_registry.py style: restore PEP8 blank-line separation after dead-code removal 2026-05-29 04:22:27 -07:00
chat_completion_helpers.py fix(anthropic+feishu): model-gate max_tokens fallback; wire Feishu channel_prompt 2026-06-30 17:20:41 -07:00
codex_responses_adapter.py fix(xai): OAuth Responses native web_search, incomplete guard, grok-composer context 2026-06-17 17:33:32 -07:00
codex_runtime.py fix(codex): seed app-server sessions with configured cwd 2026-06-21 16:39:02 -07:00
coding_context.py feat(agent): add configurable coding_instructions 2026-06-30 00:59:59 -05:00
context_breakdown.py feat(desktop): add context usage breakdown popover 2026-06-29 09:18:10 -04:00
context_compressor.py fix(compressor): pin summary role to user when only system prompt is protected (#52160) 2026-07-01 14:24:41 +05:30
context_engine.py fix(context): clamp -1 post-compression sentinel in sibling status paths 2026-07-01 13:36:50 +05:30
context_references.py perf(context-refs): expand @-references concurrently 2026-06-30 00:19:49 -07:00
conversation_compression.py fix(agent): make compression lock-lease refresher tolerate transient DB blips 2026-06-30 13:36:29 +05:30
conversation_loop.py feat(classifier): Anthropic-specific guidance for subscription exhaustion 2026-07-01 01:36:34 -07:00
copilot_acp_client.py fix(agent): stream copilot ACP chat completions 2026-06-28 22:52:51 -07:00
credential_persistence.py fix: avoid persisting borrowed credential secrets (#31416) 2026-05-25 00:32:08 -07:00
credential_pool.py test(credential_pool): cover Anthropic env auth_type classification 2026-06-30 17:29:03 -07:00
credential_sources.py docs(auth): replace stale 'hermes login' references with 'hermes auth add' 2026-05-26 15:41:11 -07:00
credits_tracker.py feat(billing): /credits command — balance + portal top-up handoff (#44776) 2026-06-12 08:51:10 +00:00
curator.py fix(curator): never archive cron-referenced skills + floor use=0 pruning (#54443) 2026-06-28 15:10:21 -07:00
curator_backup.py fix(curator): stop the rollback safety snapshot from pruning its target 2026-06-17 05:40:05 -07:00
display.py feat(display): friendly human-phrased tool labels for built-in tools (#55166) 2026-06-29 20:31:17 -07:00
error_classifier.py fix(classifier): treat Anthropic "out of extra usage" 400 as billing 2026-07-01 01:36:34 -07:00
errors.py fix(agent,gateway,doctor): add SSL CA cert bundle fail-fast guard 2026-06-13 21:14:32 -07:00
file_safety.py fix(file): block credential paths from search results 2026-07-01 01:02:35 -07:00
gemini_native_adapter.py Merge consecutive same-role contents for native Gemini 2026-06-30 11:51:22 -07:00
gemini_schema.py
i18n.py fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) 2026-06-03 12:00:27 -07:00
image_gen_provider.py feat(image-gen): add image-to-image / editing to image_generate (#48705) 2026-06-18 22:13:07 -07:00
image_gen_registry.py
image_routing.py fix(vision): detect Ollama vision models via /api/show (#54511) 2026-06-28 22:52:59 -07:00
insights.py refactor(insights): drop dead pricing/duration wrappers, call usage_pricing directly (#40618) 2026-06-07 18:33:20 -07:00
iteration_budget.py refactor(run_agent): extract OpenAI proxy, safe stdio, IterationBudget 2026-05-16 17:59:32 -07:00
jiter_preload.py fix(agent): preload jiter native parser 2026-05-28 00:20:11 -07:00
learn_prompt.py fix(learn): honor requirements mixed with sources in /learn requests (#55956) 2026-06-30 16:56:01 -07:00
learning_graph.py fix(desktop): scope memory graph cache by profile 2026-06-30 03:44:41 -05:00
learning_graph_render.py fix(journey): swap skill/memory inks so drillable rows read as clickable 2026-06-30 11:54:16 -05:00
learning_mutations.py refactor(journey): route memory mutations through MemoryStore atomic I/O 2026-06-30 15:16:21 -05:00
lmstudio_reasoning.py
manual_compression_feedback.py
markdown_tables.py
memory_manager.py fix(agent): validate context/memory tool schemas before wrapping 2026-06-25 02:17:29 +05:30
memory_provider.py fix(backup): capture memory-provider state stored outside HERMES_HOME (#50325) 2026-06-21 12:03:46 -07:00
message_content.py fix(openviking): preserve structured sync attribution 2026-06-19 15:23:41 +08:00
message_sanitization.py fix(agent): close tool-call sequence on all interrupt aborts, not just finalize_turn 2026-06-25 12:24:34 -05:00
moa_loop.py fix(moa): append reference block at end of aggregator prompt for KV-cache reuse 2026-07-01 01:59:00 -07:00
moa_trace.py feat(moa): opt-in full-turn trace persistence to JSONL (#56101) 2026-07-01 00:09:42 -07:00
model_metadata.py fix(copilot): recognize enterprise subdomains in host checks 2026-06-30 03:27:41 -07:00
models_dev.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
moonshot_schema.py fix(moonshot): handle union type arrays in tool schemas 2026-06-13 05:51:41 -07:00
nous_rate_guard.py
onboarding.py feat(onboarding): opt-in structured profile-build path on first contact (#41114) 2026-06-07 08:36:48 -07:00
oneshot.py feat(agent): one-shot LLM helper + llm.oneshot gateway RPC (#51261) 2026-06-23 08:01:50 +00:00
plugin_llm.py
portal_tags.py
process_bootstrap.py fix(auxiliary): use env-only proxy policy for OpenAI SDK clients (#53702) 2026-06-27 21:22:49 -07:00
prompt_builder.py fix(agent): limit .hermes.md parent walk to git repos only 2026-06-28 20:46:32 -07:00
prompt_caching.py
rate_limit_tracker.py
reasoning_timeouts.py fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice 2026-06-25 19:00:48 -07:00
redact.py fix(browser): close remaining CDP-URL leak paths in supervisor (review) 2026-07-01 13:43:58 +05:30
replay_cleanup.py fix(tui): sanitize replay history on WebUI/TUI session resume (#29086) (#53939) 2026-06-27 20:56:49 -07:00
retry_utils.py fix: handle named custom providers and Z.AI overload retries 2026-06-25 00:17:17 -07:00
runtime_cwd.py fix(desktop): stabilize project folder sessions (#37586) 2026-06-02 20:23:09 +00:00
secret_scope.py feat(gateway): multiplex phase 2 — fail-closed profile credential isolation (Workstream A) 2026-06-19 07:34:15 -07:00
shell_hooks.py feat(agent): add pre_verify hook and verify-on-stop coding guidance 2026-06-30 00:59:29 -05:00
skill_bundles.py feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373) 2026-05-18 21:38:05 -07:00
skill_commands.py fix(memory): strip skill scaffolding for all providers, not just openviking 2026-06-16 10:37:37 -07:00
skill_preprocessing.py fix(windows): hide console-window flash on backend git/gh/wmic/bash subprocess spawns 2026-06-28 05:28:45 -07:00
skill_utils.py fix(curator): protect external skills from background curation 2026-06-25 22:03:02 -07:00
ssl_guard.py fix(ssl): align guard docs and escape hatch 2026-06-13 21:14:32 -07:00
stream_diag.py feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816) 2026-05-28 04:53:27 -07:00
subdirectory_hints.py fix(subdirectory_hints): prevent loading AGENTS.md outside workspace 2026-05-25 23:17:33 -07:00
system_prompt.py feat(computer_use): cross-platform cua-driver (macOS/Windows/Linux) 2026-06-22 06:42:30 -07:00
think_scrubber.py
thinking_timeout_guidance.py fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice 2026-06-25 19:00:48 -07:00
thread_scoped_output.py fix(bg-review): scope stdout/stderr silencing to the worker thread (#55966) 2026-06-30 17:28:33 -07:00
title_generator.py feat(titles): support language-aware title generation (#45296) 2026-06-19 17:15:52 -07:00
tool_dispatch_helpers.py fix(agent): defang untrusted-tool-result delimiter against tag injection 2026-07-01 01:54:45 -07:00
tool_executor.py feat(display): friendly human-phrased tool labels for built-in tools (#55166) 2026-06-29 20:31:17 -07:00
tool_guardrails.py fix: add recovery hints to loop guard warnings 2026-05-19 00:12:12 -07:00
tool_result_classification.py
trajectory.py
transcription_provider.py feat(stt): add register_transcription_provider() plugin hook 2026-05-25 01:41:19 -07:00
transcription_registry.py feat(stt): add register_transcription_provider() plugin hook 2026-05-25 01:41:19 -07:00
tts_provider.py feat(tts): add register_tts_provider() plugin hook (closes #30398) 2026-05-24 18:04:54 -07:00
tts_registry.py feat(tts): add register_tts_provider() plugin hook (closes #30398) 2026-05-24 18:04:54 -07:00
turn_context.py fix(memory): degrade gracefully after repeated at-capacity consolidation failures (#42405) 2026-06-30 20:01:16 +05:30
turn_finalizer.py fix(agent,gateway): surface partial-stream recovery and bound detached restart 2026-06-27 22:03:14 -07:00
turn_retry_state.py fix(agent): route content-filter stream stalls to fallback chain (#32421) 2026-06-28 01:15:21 -07:00
usage_pricing.py fix(moa): count reference (advisor) fan-out token usage + cost (#56087) 2026-06-30 23:08:37 -07:00
verification_evidence.py feat(agent): recognize focused ad-hoc verification scripts 2026-06-24 23:03:45 -05:00
verification_stop.py feat(agent): restore surface-aware "auto" default for verify_on_stop 2026-06-30 01:43:08 -05:00
verify_hooks.py feat(agent): add pre_verify hook and verify-on-stop coding guidance 2026-06-30 00:59:29 -05:00
video_gen_provider.py
video_gen_registry.py
web_search_provider.py chore(web): remove web_crawl tool + provider crawl plumbing (#33824) 2026-05-28 04:52:42 -07:00
web_search_registry.py chore(web): remove web_crawl tool + provider crawl plumbing (#33824) 2026-05-28 04:52:42 -07:00