# OpenClaw Gateway Integration Status **Last Updated**: 2026-02-13 ## ✅ Completed Tasks ### 1. OpenClaw Gateway WebSocket Client Implementation **Status**: ✅ **COMPLETE** **Location**: `openclaw_client/client.py` **Changes Made**: - ✅ Implemented full WebSocket JSON-RPC protocol - ✅ Added connect handshake (`connect.challenge` → `connect` → `hello-ok`) - ✅ Implemented chat.send with event listening (delta → final) - ✅ Added session key generation (`agent::discord:dm:`) - ✅ Implemented automatic reconnection logic - ✅ Added per-guild client management via `PerGuildOpenClawClient` - ✅ Preserved existing `send_message()` interface for compatibility - ✅ Added connection statistics and latency tracking **Protocol Flow**: ``` WebSocket Connect → connect.challenge → connect request → hello-ok response ↓ chat.send (with sessionKey, idempotencyKey) → ack (with runId) → delta events → final event ``` **Configuration**: - ✅ Updated `utils/config.py` to support WebSocket URL format - ✅ Added `agent_id` and `session_scope` configuration options - ✅ Added `retry_timeout` for extended retry attempts - ✅ Updated `config.yaml` openclaw section with WebSocket settings - ✅ Updated `.env.example` with WebSocket URL format and auth token **Dependencies**: - ✅ Added `websockets>=12.0` to `requirements.txt` **Testing**: - ⚠️ Existing unit tests need updates for WebSocket client - ⚠️ Integration tests need real Gateway connection --- ## 🔧 Remaining Integration Work ### 2. Connect OpenClaw Client to Discord Bot **Status**: ⏳ **PENDING** **What Needs to be Done**: The OpenClawClient is implemented but not yet wired into the Discord bot pipeline. Here's what needs to happen: #### A. Bot Initialization (in `run.py` or `discord_bot/bot.py`) Create and initialize the OpenClaw Gateway client on bot startup: ```python # In run.py, after loading config: from openclaw_client import OpenClawConfig, PerGuildOpenClawClient # Create OpenClaw Gateway client configuration openclaw_config = OpenClawConfig( base_url=config.openclaw.base_url, # ws://192.168.50.9:18789 auth_token=config.openclaw.token, timeout=config.openclaw.timeout, retry_timeout=config.openclaw.retry_timeout, agent_id=config.openclaw.agent_id, session_scope=config.openclaw.session_scope, ) # Create per-guild client manager openclaw_client = PerGuildOpenClawClient(openclaw_config) # Connect to Gateway logger.info("Connecting to OpenClaw Gateway...") # Note: Connection happens lazily on first message, or explicitly: # await openclaw_client.get_or_create(guild_id).connect() ``` #### B. Pipeline Orchestrator Integration The orchestrator expects an `llm_client` callable. Create a wrapper: ```python # In voice session or orchestrator setup: async def llm_response_handler(agent: str, message: str, user_id: int, guild_id: int) -> str: """Wrapper for OpenClaw Gateway client.""" client = openclaw_client.get_or_create(guild_id) return await client.send_message( agent=agent, message=message, context="", # Gateway manages context internally speaker=str(user_id) # Used for session key generation ) # Pass to orchestrator: orchestrator = PipelineOrchestrator( config=pipeline_config, vad=vad, turn_detector=turn_detector, transcriber=transcriber, transcript_manager=transcript_manager, relevance_classifier=relevance_classifier, llm_client=llm_response_handler, # ← Use wrapper tts_synthesizer=tts_synthesizer, audio_output_callback=audio_callback, ) ``` #### C. Agent Selection Integration The `VoiceSession` tracks `current_agent` per guild. Ensure this is passed to the LLM handler: ```python async def llm_response_handler(agent: str, message: str, user_id: int, guild_id: int) -> str: # Get current agent from session session = session_manager.get_session(guild_id) current_agent = session.current_agent if session else "jarvis" # Send to Gateway with correct agent client = openclaw_client.get_or_create(guild_id) return await client.send_message( agent=current_agent, # Use session's agent setting message=message, speaker=str(user_id) ) ``` #### D. Cleanup on Disconnect When bot disconnects from Discord or guild, close Gateway connection: ```python # In voice session cleanup: async def cleanup_guild(guild_id: int): # Remove voice session await session_manager.remove_session(guild_id) # Disconnect OpenClaw client for this guild client = openclaw_client.get_or_create(guild_id) await client.disconnect() openclaw_client.remove_guild(guild_id) ``` --- ### 3. Download Smart Turn v3 Model **Status**: ⏳ **PENDING** **Current State**: - Mock ONNX model at `models/smart_turn_v3.onnx` (164 bytes placeholder) - Mock creation script at `scripts/create_mock_turn_model.py` **What to Do**: ```bash # Install huggingface_hub if not already installed pip install huggingface_hub # Download real model python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='pipecat-ai/smart-turn-v3', filename='model.onnx', local_dir='models/')" # Remove mock files rm models/smart_turn_v3.onnx rm scripts/create_mock_turn_model.py # Verify model exists and is ~8MB ls -lh models/model.onnx ``` --- ### 4. Configure TTS to Use Existing Sage-Voice Server **Status**: ⏳ **PENDING** **Decision Point**: You have two TTS options: #### Option A: Use Your Existing TTS Server (Recommended) Your sage-voice server at `http://192.168.50.47:8004` already works and has your voice models. **Modify `server/tts.py`** to use HTTP client instead of built-in TTS: ```python # Replace Chatterbox/Coqui implementation with HTTP client import httpx class TTSSynthesizer: def __init__(self, tts_url: str, device: str = "cuda"): self.tts_url = tts_url # http://192.168.50.47:8004 self.device = device async def synthesize( self, text: str, voice: str, response_format: str = "pcm" ) -> bytes: """Call sage-voice TTS server.""" async with httpx.AsyncClient() as client: response = await client.post( f"{self.tts_url}/v1/audio/speech", json={ "input": text, "voice": voice, # jarvis or sage "response_format": response_format }, timeout=10.0 ) return response.content ``` **Add to `.env`**: ```bash TTS_URL=http://192.168.50.47:8004 ``` #### Option B: Use Built-in TTS (More Complex) Provide voice reference files and use Coqui XTTS: - Place `server/voices/jarvis.wav` (10-30 seconds clean audio) - Place `server/voices/sage.wav` (10-30 seconds clean audio) - Keep existing `server/tts.py` implementation **Recommendation**: Go with **Option A** to reuse your proven TTS infrastructure. --- ### 5. Environment Configuration **Status**: ⏳ **PENDING** **Create `.env` file** in openclaw-voice directory: ```bash # Copy example cp .env.example .env # Edit with your actual values ``` **Required Configuration**: ```bash # Discord Bot (from Discord Developer Portal) DISCORD_BOT_TOKEN= # OpenClaw Gateway (on Synology NAS) OPENCLAW_BASE_URL=ws://192.168.50.9:18789 OPENCLAW_AUTH_TOKEN= OPENCLAW_AGENT_ID=main # TTS Server (your existing sage-voice server) TTS_URL=http://192.168.50.47:8004 # FastAPI Server (openclaw-voice API endpoints) SERVER_HOST=0.0.0.0 SERVER_PORT=8880 # Pipeline Settings (optional overrides) PIPELINE__STT__MODEL_SIZE=medium PIPELINE__STT__DEVICE=cuda PIPELINE__TTS__DEVICE=cuda ``` **Where to Get Values**: - `DISCORD_BOT_TOKEN`: Discord Developer Portal → Your Application → Bot → Token - `OPENCLAW_AUTH_TOKEN`: Check your NAS OpenClaw Gateway config or create new token - TTS_URL: Already running at `192.168.50.47:8004` --- ### 6. Testing End-to-End Flow **Status**: ⏳ **PENDING** **Test Plan**: #### A. Test OpenClaw Gateway Connection ```python # Create test script: test_gateway_connection.py import asyncio from openclaw_client import create_client async def test_connection(): client = create_client( base_url="ws://192.168.50.9:18789", auth_token="", agent_id="main" ) try: await client.connect() print("✓ Connected to Gateway") response = await client.send_message( agent="jarvis", message="Hello, this is a test", speaker="test_user" ) print(f"✓ Received response: {response}") await client.disconnect() print("✓ Disconnected") except Exception as e: print(f"✗ Error: {e}") asyncio.run(test_connection()) ``` #### B. Test Discord Bot End-to-End 1. Start openclaw-voice bot: ```bash python run.py ``` 2. Join Discord voice channel 3. Use slash commands: ``` /join /agent jarvis /sensitivity medium ``` 4. Speak into microphone: - Bot should detect voice (VAD) - Wait for Smart Turn completion - Transcribe speech (STT) - Check relevance - Send to OpenClaw Gateway - Generate TTS response - Play audio back 5. Check logs for latency breakdown: ``` VAD: XXms Smart Turn: XXms STT: XXms Relevance: XXms Gateway: XXXXms TTS: XXms Total: ~3-7s ``` #### C. Test Agent Switching ``` /agent sage [speak] "Tell me about philosophy" [expect Sage's voice and personality] /agent jarvis [speak] "What's the weather?" [expect Jarvis's voice and personality] ``` #### D. Test Relevance Filtering ``` /sensitivity low [speak unrelated conversation] [expect bot to stay quiet] [speak "Hey Jarvis, ..." or "Jarvis, ..."] [expect bot to respond] /sensitivity high [speak relevant question without name] [expect bot to respond] ``` --- ## 📋 Quick Start Checklist To get openclaw-voice running with your OpenClaw Gateway: - [x] ~~Implement OpenClaw Gateway WebSocket client~~ ✅ - [x] ~~Add websockets dependency~~ ✅ - [x] ~~Update configuration files~~ ✅ - [ ] Download Smart Turn v3 model from HuggingFace - [ ] Create `.env` file with your credentials - [ ] Modify `server/tts.py` to use your existing TTS server (Option A) - [ ] Wire OpenClawClient into bot initialization (`run.py` or `discord_bot/bot.py`) - [ ] Create LLM response handler wrapper for orchestrator - [ ] Test Gateway connection standalone - [ ] Install dependencies: `pip install -r requirements.txt` - [ ] Run end-to-end test with Discord voice --- ## 🎯 Next Steps 1. **Complete Task #2**: Download real Smart Turn model 2. **Complete Task #3**: Configure TTS (recommend Option A - use existing server) 3. **Complete Task #4**: Create .env with your credentials 4. **Wire up the bot**: Integrate OpenClawClient into Discord bot initialization 5. **Complete Task #5**: Test end-to-end flow --- ## 📚 Reference ### Session Key Format ``` agent::discord:dm: ``` Examples: - `agent:main:discord:dm:123456789` (user 123456789 talking to main agent) - `agent:jarvis:discord:dm:987654321` (user 987654321 talking to jarvis agent) ### Gateway Protocol Summary ``` 1. WebSocket Connect 2. Server sends: connect.challenge (with nonce) 3. Client sends: connect request (with auth token) 4. Server sends: hello-ok response (with server info) 5. Client sends: chat.send (with sessionKey, message, idempotencyKey) 6. Server sends: ack response (with runId) 7. Server sends: delta events (streaming response) 8. Server sends: final event (complete response) ``` ### File Locations - **OpenClaw Client**: `openclaw_client/client.py` - **Configuration**: `utils/config.py`, `config.yaml`, `.env` - **Bot Entry**: `run.py` - **Discord Bot**: `discord_bot/bot.py` - **Voice Sessions**: `discord_bot/voice_session.py` - **Pipeline**: `pipeline/orchestrator.py` - **TTS**: `server/tts.py` --- ## 🐛 Troubleshooting ### WebSocket Connection Fails - Verify Gateway is running: `ssh Hyriel@192.168.50.9 'sudo /usr/local/bin/docker logs --tail 50 openclaw-gateway'` - Check NAS firewall allows port 18789 - Verify auth token is correct - Check logs for connection errors ### Bot Doesn't Respond to Voice - Check VAD is detecting speech (logs should show "speech detected") - Verify STT model is downloaded (first run downloads ~500MB-5GB) - Check OpenClaw Gateway receives messages (NAS logs) - Verify TTS server is reachable: `curl http://192.168.50.47:8004/health` ### Agent Switching Doesn't Work - Verify session management is passing `current_agent` to LLM handler - Check that `session.current_agent` is updated by `/agent` command - Verify Gateway session key uses correct agent ID --- **Status Summary**: 40% Complete (2/5 major tasks done) **Estimated Time to Completion**: 2-4 hours (with testing)