hermes-agent/tests
sasquatch9818 020d263ef6 fix(agent): defang untrusted-tool-result delimiter against tag injection
`_maybe_wrap_untrusted` is the architectural defense against indirect
prompt injection. It wraps attacker-controllable tool output
(web_extract, web_search, browser_*, mcp_*) in
`<untrusted_tool_result>...</untrusted_tool_result>` so the model treats
it as data. The content was interpolated verbatim, so the boundary was
forgeable.

Two holes. A poisoned page that embeds `</untrusted_tool_result>` closes
the block early — everything after it reads as trusted instructions. And
the `startswith("<untrusted_tool_result")` re-entrancy guard returned
content that merely started with the opening tag completely unwrapped, so
an attacker just prefixed the tag to drop all data framing.

Fix neutralizes any embedded delimiter token (case-insensitive) before
interpolation and drops the forgeable fast-path, so content is always
sealed in exactly one well-formed block. Re-wrapping an already-wrapped
forward is harmless — it stays framed as data.

## What does this PR do?

Closes an indirect prompt-injection bypass in the untrusted-tool-result
wrapper. Attacker content can no longer break out of, or forge, the
trust boundary.

## Related Issue

N/A

## Type of Change

- [x] 🔒 Security fix

## Changes Made

- `agent/tool_dispatch_helpers.py`: add `_neutralize_delimiters` (case-insensitive defang of the `untrusted_tool_result` token); `_maybe_wrap_untrusted` now always neutralizes then wraps, and the forgeable `startswith` re-entrancy guard is removed.
- `tests/agent/test_tool_dispatch_helpers.py`: replace the double-wrap test (it encoded the bypass) with regression tests for embedded closing tag, leading opening tag, and a cased closing tag.

## How to Test

1. `scripts/run_tests.sh tests/agent/test_tool_dispatch_helpers.py` — 29 pass.
2. Embedded `</untrusted_tool_result>` mid-content: real closing delimiter appears once, at the end; payload trapped inside.
3. Content starting with the opening tag: data framing is applied, not skipped.

## Checklist

### Code

- [x] I've read the Contributing Guide
- [x] My commit messages follow Conventional Commits
- [x] I searched for existing PRs to make sure this isn't a duplicate
- [x] My PR contains only changes related to this fix
- [x] I've run the affected tests and they pass
- [x] I've added tests for my changes
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)

### Documentation & Housekeeping

- [x] I've updated relevant documentation (docstrings) — or N/A
- [x] cli-config.yaml.example — N/A
- [x] CONTRIBUTING.md / AGENTS.md — N/A
- [x] Cross-platform impact — N/A (pure-Python, stdlib `re`)
- [x] Tool descriptions/schemas — N/A
2026-07-01 01:54:45 -07:00
..
acp fix: bound threat-pattern/FTS5 regex input and cover V4A Move-File edits 2026-07-01 01:05:28 -07:00
acp_adapter
agent fix(agent): defang untrusted-tool-result delimiter against tag injection 2026-07-01 01:54:45 -07:00
ci fix(ci): classify should default to no MCP 2026-06-23 10:32:27 -07:00
cli fix(cli): route /sessions and /history through prompt_toolkit-safe printing 2026-07-01 01:25:43 -07:00
computer_use feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842) 2026-06-22 09:57:16 -07:00
cron security(cron): fail closed in scheduler backstop when validator errors 2026-07-01 14:23:01 +05:30
docker fix(s6): dot-prefix gateway staging dir so svscan ignores it mid-build (#54834) 2026-06-29 21:33:00 +10:00
e2e fix(gateway): route SessionDB calls through AsyncSessionDB 2026-06-29 15:51:57 -07:00
fakes
fixtures/plugins/example-dashboard/dashboard
gateway fix(gateway): persist compressed transcript before repointing /compress session 2026-07-01 01:39:23 -07:00
hermes_cli fix(runtime): honor NOUS_INFERENCE_BASE_URL across pool/explicit/aux paths 2026-07-01 01:52:06 -07:00
hermes_state fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes 2026-06-25 16:29:09 -07:00
honcho_plugin feat(memory): Honcho OAuth connect — desktop and CLI flows + token refresh (#44335) 2026-06-22 19:16:47 -05:00
integration feat(web_extract): truncate-and-store instead of LLM summarization (#54843) 2026-06-29 10:00:49 -07:00
openviking_plugin feat(openviking): add full recall prefetch policy 2026-06-24 18:53:49 +05:30
plugins fix(memory/holographic): sanitize FTS5 queries for natural-language recall 2026-06-30 15:55:11 -07:00
providers fix(models): pass model.base_url to fetch_models in /model picker 2026-06-16 13:09:40 -07:00
run_agent fix(agent): never persist empty-response recovery scaffolding 2026-07-01 01:08:27 -07:00
scripts revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
skills feat(skills): add cloudflare-temporary-deploy optional skill (#50849) 2026-06-22 12:14:30 -07:00
stress
tools fix(security): anchor rm hardline rules to command position (#56193) 2026-07-01 01:54:43 -07:00
tui_gateway fix(tui_gateway): drop emit-only session.info from _LONG_HANDLERS 2026-06-30 03:11:13 -07:00
website
__init__.py
conftest.py feat(managed-scope): add managed_scope module (resolver, loaders, key helpers) 2026-06-19 07:46:33 -07:00
run_interrupt_test.py
test_account_usage.py
test_assistant_ui_tap_compat.py test(deps): guard @assistant-ui cluster on one tap version 2026-06-15 11:55:02 -04:00
test_atomic_replace_symlinks.py fix(utils): copy fallback for atomic replace across devices (#43852) 2026-06-13 14:50:05 -07:00
test_base_url_hostname.py
test_batch_runner_checkpoint.py
test_bitwarden_secrets.py
test_cli_file_drop.py
test_cli_manual_compress.py
test_cli_skin_integration.py
test_code_skew.py fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
test_ctx_halving_fix.py
test_dashboard_sidecar_close_on_disconnect.py fix(dashboard): hide sidecar sessions from history (#49269) 2026-06-19 18:06:38 -04:00
test_delegate_cascade_49148.py fix(agent): stop delegate cascade from deleting the parent session 2026-06-21 12:09:16 -07:00
test_desktop_electron_pin.py fix(desktop): resolve electronDist dynamically + self-heal blocked installs (supersedes #48081/#48082) (#48091) 2026-06-17 18:48:35 -05:00
test_desktop_mac_entitlements.py
test_dispatch_session_id.py fix(dispatch): forward session_id into registry.dispatch (#28479) 2026-06-14 00:27:59 -04:00
test_empty_model_fallback.py
test_empty_session_hygiene.py fix: in-memory transcript blocks empty-session prune 2026-06-10 17:37:34 -07:00
test_env_loader_secret_sources.py
test_evidence_store.py
test_fast_safe_load.py perf(startup): parse config + plugin manifests with libyaml CSafeLoader (#54486) 2026-06-28 15:38:39 -07:00
test_gateway_streaming_nested_config.py
test_get_tool_definitions_cache_isolation.py
test_hermes_bootstrap.py revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853) 2026-06-27 15:59:00 -07:00
test_hermes_constants.py fix(windows): cover remaining console-flash spawn legs (#54417) 2026-06-28 13:49:08 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py fix(logging): suppress Windows lock timeout tracebacks 2026-06-28 22:35:56 -07:00
test_hermes_state.py fix(state): periodically merge FTS5 segments to curb write-lock contention 2026-07-01 14:09:15 +05:30
test_hermes_state_compression_locks.py
test_hermes_state_wal_fallback.py
test_honcho_client_concurrency.py
test_honcho_client_config.py
test_honcho_session_context.py
test_honcho_startup_fail_open.py
test_install_diverged_update.py test(installer): cover diverged managed-clone recovery in install scripts 2026-06-30 20:11:01 +07:00
test_install_lockfile_churn.py fix(install): discard managed lockfile churn before stashing 2026-06-25 23:49:11 -07:00
test_install_no_initial_commit.py
test_install_ps1_native_stderr_eap.py fix(install): fail fast when uv venv genuinely fails under relaxed EAP 2026-06-18 22:11:35 +05:30
test_install_ps1_python_fallback_venv.py test(installer): lock Python-fallback propagation into the venv stage (#50769) 2026-06-23 21:33:08 -07:00
test_install_ps1_uv_powershell_host.py test(install): lock uv installer to a resolved PowerShell host 2026-06-18 16:26:34 +07:00
test_install_sh_browser_install.py test(install): track run_with_timeout extraction after #39219 refactor (#54185) 2026-06-28 03:58:01 -07:00
test_install_sh_install_method_stamp.py fix(update): scope install-method stamp to the code tree, not $HERMES_HOME (#48188) 2026-06-18 14:14:41 +10:00
test_install_sh_node_global_prefix.py fix(hermes): heal broken managed Node tree instead of PATH fallback 2026-06-26 20:10:20 +05:30
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py
test_install_sh_termux_network_prereqs.py
test_install_unmerged_index.py fix(install): discard managed lockfile churn before stashing 2026-06-25 23:49:11 -07:00
test_ipv4_preference.py
test_lazy_session_regressions.py fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884) 2026-06-24 23:51:31 +05:30
test_lint_config.py
test_live_system_guard_self_test.py
test_mcp_serve.py
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py
test_minisweagent_path.py
test_model_forces_max_completion_tokens.py
test_model_picker_scroll.py
test_model_tools.py feat(moa): expose MoA presets as selectable virtual models (#46081) 2026-06-25 13:52:06 -07:00
test_model_tools_async_bridge.py
test_ollama_num_ctx.py test(vision): cover Ollama /api/show vision capability routing (#54511) 2026-06-28 22:52:59 -07:00
test_output_cap_parsing.py fix(agent): stop over-cap max_tokens 400s from death-looping into compression (#55570) 2026-06-30 03:26:41 -07:00
test_package_json_lazy_deps.py
test_packaging_metadata.py feat(mcp-catalog): add official Unreal Engine 5.8 MCP server 2026-06-18 09:16:40 -07:00
test_plugin_skills.py
test_plugin_utils.py
test_process_loop_event_loop_warning.py
test_profile_isolation_runtime.py test(profile): two-profile regression suite + preserve skills_hub monkeypatch seam 2026-06-30 15:30:06 -07:00
test_project_metadata.py fix(memory): lazy-install supermemory + mem0 SDKs like honcho/hindsight 2026-06-29 00:25:36 -07:00
test_retry_utils.py fix: handle named custom providers and Z.AI overload retries 2026-06-25 00:17:17 -07:00
test_run_tests_parallel.py fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008) 2026-06-27 22:43:26 -07:00
test_sanitize_tool_error.py
test_setup_temporary_outputs.py refactor(ci): rewrite docker tests to check built container 2026-06-26 19:15:18 -07:00
test_slash_worker_watchdog.py
test_sql_injection.py
test_stale_utils_module_import.py fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
test_state_db_malformed_repair.py fix(state): detect and repair FTS write corruption that silently drops gateway history (#52798) 2026-06-25 21:18:41 -07:00
test_subprocess_home_isolation.py fix: make profile subprocess HOME policy explicit 2026-06-14 03:20:21 -07:00
test_termux_all_extra_compat.py
test_timezone.py
test_toolset_distributions.py
test_toolsets.py
test_trajectory_compressor.py
test_trajectory_compressor_async.py
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py
test_tui_gateway_loop_noise.py fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005) (#54126) 2026-06-28 02:35:01 -07:00
test_tui_gateway_queue_on_busy.py fix(tui_gateway): queue mid-turn prompts instead of dropping them on a busy retry 2026-06-25 12:29:49 -05:00
test_tui_gateway_server.py fix(tui_gateway): reject negative truncate_before_user_ordinal to prevent silent history loss 2026-07-01 01:52:58 -07:00
test_tui_gateway_ws.py fix(tui): start MCP discovery for websocket sessions 2026-06-28 04:14:12 -07:00
test_tui_mcp_late_refresh.py fix(tui): refresh tool snapshot when MCP discovery lands after agent build (#48403) 2026-06-18 05:41:23 -07:00
test_utils_truthy_values.py
test_web_server.py test(web_server): assert ws-ping invariant, not frozen 20.0 literal 2026-06-30 03:11:13 -07:00
test_wheel_locales_e2e.py
test_windows_subprocess_no_window_flags.py test: make windows no-window-flag assertions immune to update-check daemon 2026-06-30 01:35:55 -05:00
test_yaml_indent_consistency_31999.py fix(utils): unify YAML list indent across all config writers (#31999) 2026-06-25 23:27:44 +05:30
test_yuanbao_integration.py
test_yuanbao_markdown.py
test_yuanbao_pipeline.py feat(Yuanbao): support wechat forward msg (#43508) 2026-06-12 02:06:47 -07:00
test_yuanbao_proto.py
test_yuanbao_shutdown.py