feat(tools): progressive tool disclosure for MCP and plugin tools
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.
Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.
Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:
- Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
'tools silently missing from isolated cron turns' regression class
(openclaw#84141) by construction: there is no code path that can
drop a core tool.
- Catalog is stateless across turns — rebuilt from the live tool-defs
list on every assembly. No session-keyed Map that can drift out of
sync with the registry.
- tool_call unwraps the bridge call before any hook fires, so plugin
pre/post hooks, guardrails, approval flows, and the activity feed
all see the underlying tool name, not the bridge (addresses
openclaw#85588 and the verbose-mode complaint on openclaw#79823).
- The unwrap happens in both the parallel and sequential paths of
agent/tool_executor.py and also in handle_function_call, so direct
callers (sandboxed code, eval harnesses) are covered too.
- Bridge tools cannot invoke each other (recursion guard) and cannot
invoke core tools (those must be called directly).
- Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
- Token estimation via cheap char/4 heuristic; precision isn't needed
for the threshold decision.
Files:
- tools/tool_search.py — new module (BM25 retrieval, classification,
threshold gate, bridge dispatch, unwrap helper).
- tests/tools/test_tool_search.py — 35 tests including the OpenClaw
#84141 regression guard.
- model_tools.py — wires assembly into _compute_tool_definitions as the
final step, adds skip_tool_search_assembly kwarg so the bridge can
see the real catalog, dispatches the three bridge tools.
- agent/tool_executor.py — unwraps tool_call in both parallel and
sequential parsing loops so checkpointing, guardrails, plugin hooks,
and tool-progress callbacks all observe the underlying tool name.
- hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
- website/docs/user-guide/features/tool-search.md — user docs.
Validation:
- 35/35 new tests pass.
- Existing tool/registry/model_tools/config/coercion/executor tests
(82 + 74 + small adjacents) green.
- Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
3 bridges, tool_search returns top 3 hits, tool_describe returns
full schema, tool_call dispatches to the real underlying handler
and the underlying result is what the model sees.
- Reserved-name recursion guard verified live.
- Core-tool refusal via tool_call verified live.
This commit is contained in:
parent
73d73f1f0d
commit
369075dc95
6 changed files with 1453 additions and 1 deletions
152
website/docs/user-guide/features/tool-search.md
Normal file
152
website/docs/user-guide/features/tool-search.md
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
---
|
||||
title: Tool Search
|
||||
sidebar_position: 95
|
||||
---
|
||||
|
||||
# Tool Search
|
||||
|
||||
When you have many MCP servers or non-core plugin tools attached to a
|
||||
session, their JSON schemas can consume a substantial fraction of the
|
||||
context window on every turn — even when only a few of them are relevant
|
||||
to what the user actually asked for.
|
||||
|
||||
**Tool Search** is Hermes' opt-in progressive-disclosure layer for that
|
||||
problem. When activated, MCP and plugin tools are replaced in the
|
||||
model-visible tools array by three bridge tools, and the model loads each
|
||||
specific tool's schema on demand.
|
||||
|
||||
:::info Built-in Hermes tools never defer
|
||||
The tools that make up Hermes' core capability set (`terminal`,
|
||||
`read_file`, `write_file`, `patch`, `search_files`, `todo`, `memory`,
|
||||
`browser_*`, `web_search`, `web_extract`, `clarify`, `execute_code`,
|
||||
`delegate_task`, `session_search`, `send_message`, and the rest of
|
||||
`_HERMES_CORE_TOOLS`) are *always* loaded directly. Only MCP tools and
|
||||
non-core plugin tools are eligible for deferral.
|
||||
:::
|
||||
|
||||
## How it works
|
||||
|
||||
When Tool Search activates for a turn, the model sees three new tools in
|
||||
place of the deferred ones:
|
||||
|
||||
```
|
||||
tool_search(query, limit?) — search the deferred-tool catalog
|
||||
tool_describe(name) — load the full schema for one tool
|
||||
tool_call(name, arguments) — invoke a deferred tool
|
||||
```
|
||||
|
||||
A typical interaction looks like:
|
||||
|
||||
```
|
||||
Model: tool_search("create a github issue")
|
||||
→ { matches: [{ name: "mcp_github_create_issue", ... }, ...] }
|
||||
Model: tool_describe("mcp_github_create_issue")
|
||||
→ { parameters: { type: "object", properties: { ... } } }
|
||||
Model: tool_call("mcp_github_create_issue", { title: "...", body: "..." })
|
||||
→ { ok: true, issue_number: 42 }
|
||||
```
|
||||
|
||||
When the model invokes `tool_call`, Hermes **unwraps the bridge** and
|
||||
dispatches the underlying tool exactly as if the model had called it
|
||||
directly. Pre-tool-call hooks, guardrails, approval prompts, and
|
||||
post-tool-call hooks all run against the real tool name — not against
|
||||
`tool_call`. The activity feed in the CLI and gateway also unwraps so you
|
||||
see the underlying tool, not the bridge.
|
||||
|
||||
## When does it activate?
|
||||
|
||||
By default Tool Search runs in `auto` mode: it activates only when the
|
||||
deferrable tool schemas would consume at least 10% of the active model's
|
||||
context window. Below that, the tools-array assembly is a pure
|
||||
pass-through and you pay no overhead.
|
||||
|
||||
This decision is re-evaluated every time the tools array is built, so:
|
||||
|
||||
- A session with just a few MCP tools and a long context model never
|
||||
activates Tool Search.
|
||||
- A session with many MCP servers attached (15+ tools typically) starts
|
||||
activating it.
|
||||
- Removing MCP servers mid-session correctly returns to direct exposure
|
||||
on the next assembly.
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
tools:
|
||||
tool_search:
|
||||
enabled: auto # auto (default), on, or off
|
||||
threshold_pct: 10 # percentage of context — only used in auto mode
|
||||
search_default_limit: 5
|
||||
max_search_limit: 20
|
||||
```
|
||||
|
||||
| Key | Default | Meaning |
|
||||
| --- | --- | --- |
|
||||
| `enabled` | `auto` | `auto` activates above threshold; `on` always activates if there's at least one deferrable tool; `off` disables entirely. |
|
||||
| `threshold_pct` | `10` | Percentage of context length at which `auto` mode kicks in. Range 0–100. |
|
||||
| `search_default_limit` | `5` | Hits returned when the model calls `tool_search` without a `limit`. |
|
||||
| `max_search_limit` | `20` | Hard upper bound the model can request via `limit`. Range 1–50. |
|
||||
|
||||
You can also flip the legacy boolean shape:
|
||||
|
||||
```yaml
|
||||
tools:
|
||||
tool_search: true # equivalent to {enabled: auto}
|
||||
```
|
||||
|
||||
## When NOT to use it
|
||||
|
||||
Tool Search trades a fixed per-turn token cost (the three bridge tool
|
||||
schemas, ~300 tokens) and at least one extra round trip (search →
|
||||
describe → call) for the savings on the deferred schemas. It's a clear
|
||||
win when you have many tools and use few per turn; it's overhead when
|
||||
you have few tools total.
|
||||
|
||||
The `auto` default handles this for you. If you set `enabled: on`
|
||||
unconditionally, expect a slight per-turn cost on small toolsets.
|
||||
|
||||
## Trade-offs that don't go away
|
||||
|
||||
These come from the prompt-cache integrity invariant — they are inherent
|
||||
to any progressive-disclosure design, not specific to this implementation:
|
||||
|
||||
- **One extra round trip on cold tools.** The first time the model needs
|
||||
a deferred tool, it spends one or two extra model calls to find and
|
||||
load the schema. The token savings on the static side are real, but a
|
||||
portion is paid back at runtime.
|
||||
- **No cache benefit on deferred schemas.** A loaded `tool_describe`
|
||||
result enters the conversation history (so it does get cached on
|
||||
subsequent turns) but it never benefits from the system-prompt cache
|
||||
prefix.
|
||||
- **Model-quality dependence.** Tool Search assumes the model can write a
|
||||
reasonable search query for the tool it wants. Smaller models do this
|
||||
less well; the published Anthropic numbers (49% → 74% on Opus 4 with
|
||||
vs. without tool search) show the upside but also that ~26 points of
|
||||
accuracy is still retrieval failure.
|
||||
- **Toolset edits invalidate cache.** Adding or removing a tool mid-
|
||||
session changes the bridge tools' descriptions (which include the
|
||||
count of deferred tools) and the catalog, so the prompt cache is
|
||||
invalidated. This is the same trade-off as any toolset edit.
|
||||
|
||||
## Implementation details
|
||||
|
||||
- **Retrieval:** BM25 over tokenized tool name + description + parameter
|
||||
names. Falls back to a literal substring match on the tool name when
|
||||
BM25 returns no positive-score hits, which protects against
|
||||
zero-IDF degenerate cases (e.g. searching `"github"` against a
|
||||
catalog where every tool name contains "github").
|
||||
- **Catalog is stateless across turns.** It rebuilds from the current
|
||||
tool-defs list every assembly — no session-keyed `Map`. This avoids
|
||||
the class of bug where a stored catalog drifts out of sync with the
|
||||
live tool registry.
|
||||
- **No JS sandbox.** Hermes uses the simpler "structured tools" mode
|
||||
(search / describe / call as plain functions). The JS-sandbox "code
|
||||
mode" some other implementations offer is a large surface area; we
|
||||
skip it.
|
||||
|
||||
## See also
|
||||
|
||||
- `tools/tool_search.py` — the implementation
|
||||
- `tests/tools/test_tool_search.py` — the regression suite
|
||||
- The `openclaw-tool-search-report` PDF in the original implementation
|
||||
PR for the research that shaped the design
|
||||
Loading…
Add table
Add a link
Reference in a new issue