fix(mcp-oauth): anchor 401 handler task to prevent GC mid-flight

`handle_401` spawned a dedup'd recovery coroutine via
`asyncio.create_task(_do_handle())` and discarded the returned task
reference. Python's event loop only keeps weak references to tasks, so
the coroutine could be garbage-collected before it called
`pending.set_result(...)`. Every concurrent caller awaiting that future
then hangs forever, and the `finally: entry.pending_401.pop(...)`
cleanup never runs — so subsequent 401s for the same key latch onto the
dead future too. Same pattern the adapter-side fixes address (#11997,
#11998, #12000, #12001, #12006).

Hold the task in a process-wide set on the manager and discard it via
`add_done_callback` once it completes. Regression test covers both the
structural invariant (task tracked, then removed on completion) and a
concurrent dedup path with a forced `gc.collect()` between the handler's
await points.
This commit is contained in:
haileymarshall 2026-04-18 18:13:04 +01:00 committed by Teknium
parent d431dfc448
commit 9f22f36625
2 changed files with 102 additions and 1 deletions

View file

@ -451,6 +451,10 @@ class MCPOAuthManager:
def __init__(self) -> None:
self._entries: dict[str, _ProviderEntry] = {}
self._entries_lock = threading.Lock()
# Holds strong references to in-flight 401 handler tasks so the
# event loop's weak-reference bookkeeping cannot GC them mid-run
# and leave `await pending` waiters hanging forever.
self._inflight_tasks: set[asyncio.Task] = set()
# -- Provider construction / caching -------------------------------------
@ -677,7 +681,9 @@ class MCPOAuthManager:
finally:
entry.pending_401.pop(key, None)
asyncio.create_task(_do_handle())
task = asyncio.create_task(_do_handle())
self._inflight_tasks.add(task)
task.add_done_callback(self._inflight_tasks.discard)
try:
return await pending