openclaw-voice/DISCORD_OPTIMIZATION_TEST.md

# Discord Voice Bot - Optimization Testing Guide

**Goal:** Verify the 3-10x latency improvements from Phase 1 optimizations

---

## Pre-Flight Checklist

### ✅ Requirements

1. **Discord Bot Token** - Set in `.env` file
2. **OpenClaw Gateway** - Running at `http://192.168.50.9:18789` (or update `.env`)
3. **Voice Files** - `server/voices/jarvis.wav` (or `.mp3`)
4. **GPU** - CUDA-capable GPU available
5. **Discord Server** - Bot invited with Voice permissions

### ✅ Configuration Check

**Verify these settings in `config.yaml`:**

```yaml
pipeline:
  stt:
    model_size: "medium"
    device: "cuda"
    beam_size: 1  # ✅ Should be 1 (was 5)
```

**Verify `.env` file exists:**
```bash
# Check if .env is configured
cat .env | grep -E "(DISCORD_TOKEN|OPENCLAW_BASE_URL|OPENCLAW_AUTH_TOKEN)"
```

---

## Starting the Bot

### 1. Activate Environment

**Windows:**
```cmd
activate.bat
```

**If venv not found:**
```cmd
setup.bat
```

### 2. Start Bot

```cmd
python run.py
```

### 3. Expected Startup Output

**Watch for these critical logs:**

```
======================================================================
Jarvis Voice Bot Starting
======================================================================
Loading configuration...
✓ Discord token configured
✓ OpenClaw Gateway configured

Initializing TTS and STT engines...
Loading Chatterbox-Turbo on cuda...
Model loaded. Sample rate: 24000Hz
✓ TTS engine initialized (cuda)

🔥 NEW: Warming up TTS engine and caching common phrases...
Pre-generating 15 phrases for jarvis...
Cached phrase for jarvis: 'Yes, sir.'
Cached phrase for jarvis: 'Right away, sir.'
...
Warmup complete: cached 27 phrases in 8.3s (3.3 phrases/sec)
✓ TTS warmup complete (27 phrases cached)

Loading faster-whisper model: medium (device: cuda, compute: float16)
Whisper model loaded successfully: medium
✓ STT engine initialized (medium on cuda)

🔥 NEW: Query router initialized (default: sonnet)

✓ Discord bot started
✓ API server started on 0.0.0.0:8880

All services running. Press Ctrl+C to stop.
```

**🚨 If you don't see "TTS warmup complete" and "Query router initialized", the optimizations didn't load!**

---

## Discord Commands

### Join Voice Channel

In Discord server, type:
```
/join
```

**Or specify channel:**
```
/join channel:General Voice
```

**Expected Response:**
```
✅ Joined voice channel: General Voice
🎤 Listening for voice...
```

**Server Logs:**
```
Created pipeline for user: YourName (123456789)
Voice connection established
Audio bridge ready
```

---

## Testing the Optimizations

### Test 1: Simple Query + Cache Hit (Fastest)

**Goal:** Verify TTS cache is working (should be near-instant)

**Say:** "Hey Jarvis"

**Expected Behavior:**
- Response in ~400-700ms
- Router → Haiku
- TTS → Cache hit

**Server Logs to Watch:**
```
Speech started: YourName (123456789)
Speech ended: YourName (silence: 0.32s)
Turn complete for YourName (latency: 0.051s)

Transcribed (YourName): "Hey Jarvis" (latency: 0.287s)  ✅ Faster than before!
Added to transcript: YourName said "Hey Jarvis"

Responding to YourName: "Hey Jarvis" (latency: 0.113s)

🔥 NEW: Routed to haiku (confidence: 0.90, reason: matched_simple_pattern)

🔥 NEW: First sentence from LLM in 0.124s: "Yes, sir."

🔥 NEW: Cache hit for jarvis: 'Yes, sir.' (hit rate: 100.0%)

🔥 NEW: First audio playing in 0.154s (LLM: 0.124s, TTS: 0.030s)

Streaming response complete (jarvis, haiku): "Yes, sir."
Pipeline complete for YourName: total latency 0.673s

✅ SUCCESS: <1 second total latency!
```

**What This Tests:**
- ✅ STT beam_size=1 optimization
- ✅ Smart Model Router (Haiku selection)
- ✅ TTS phrase caching
- ✅ Total latency <1s

---

### Test 2: Simple Query + Cache Miss (Still Fast)

**Goal:** Verify Haiku routing for simple queries

**Say:** "Thank you Jarvis"

**Expected Behavior:**
- Response in ~700-1200ms
- Router → Haiku
- TTS → Cache miss (generate on-the-fly)

**Server Logs to Watch:**
```
Transcribed (YourName): "Thank you Jarvis" (latency: 0.312s)

🔥 NEW: Routed to haiku (confidence: 0.90, reason: matched_simple_pattern)

🔥 NEW: First sentence from LLM in 0.183s: "You're welcome, sir."

Cache miss  ← Phrase not in cache
Generating TTS for 'jarvis': "You're welcome, sir." (0 emotion tags)
Generated 1.24s audio in 0.38s (RTF: 0.31)

🔥 NEW: First audio playing in 0.612s (LLM: 0.183s, TTS: 0.429s)

Pipeline complete for YourName: total latency 1.087s

✅ SUCCESS: Just over 1 second!
```

**What This Tests:**
- ✅ Haiku routing for greetings/thanks
- ✅ Streaming TTS (generates while LLM streams)
- ✅ Total latency ~1s

---

### Test 3: Medium Query (Sonnet)

**Goal:** Verify Sonnet routing for medium complexity

**Say:** "What's the weather like today?"

**Expected Behavior:**
- Response in ~1-2s
- Router → Sonnet
- Sentence-level streaming TTS

**Server Logs to Watch:**
```
Transcribed (YourName): "What's the weather like today?" (latency: 0.341s)

🔥 NEW: Routed to sonnet (confidence: 0.80, reason: matched_medium_pattern)

🔥 NEW: First sentence from LLM in 0.423s: "Let me check the weather for you."

Extracted sentence #0: "Let me check the weather for you."
Cache miss
Generating TTS for 'jarvis': "Let me check the weather for you."
Generated 1.89s audio in 0.52s (RTF: 0.27)

🔥 NEW: First audio playing in 0.987s (LLM: 0.423s, TTS: 0.564s)

Extracted sentence #1: "Currently, it's partly cloudy with a temperature..."
Played sentence #0 (1.89s audio)
Generating TTS for sentence #1...
Played sentence #1 (2.34s audio)

Streaming response complete (jarvis, sonnet): "Let me check... Currently..."
Pipeline complete for YourName: total latency 2.134s

✅ SUCCESS: Under 2.5 seconds target!
```

**What This Tests:**
- ✅ Sonnet routing for information queries
- ✅ Sentence-level streaming (first audio while rest generates)
- ✅ Total latency <2.5s

---

### Test 4: Complex Query (Opus)

**Goal:** Verify Opus routing for complex analysis

**Say:** "Analyze the pros and cons of using Pipecat versus a custom voice pipeline"

**Expected Behavior:**
- Response in ~1.5-3s
- Router → Opus
- Multiple sentences streaming

**Server Logs to Watch:**
```
Transcribed (YourName): "Analyze the pros and cons of using Pipecat..." (latency: 0.387s)

🔥 NEW: Routed to opus (confidence: 0.85, reason: matched_complex_pattern)

🔥 NEW: First sentence from LLM in 0.892s: "That's an excellent question, sir."

Cache miss
Generating TTS...

🔥 NEW: First audio playing in 1.476s (LLM: 0.892s, TTS: 0.584s)

Extracted sentence #1: "Pipecat offers several advantages including..."
Extracted sentence #2: "On the other hand, a custom pipeline gives you..."
Extracted sentence #3: "In terms of performance, Pipecat claims..."

Streaming response complete (jarvis, opus): "That's an excellent... [full response]"
Pipeline complete for YourName: total latency 2.876s

✅ SUCCESS: Under 3 seconds for complex query!
```

**What This Tests:**
- ✅ Opus routing for analysis/complex queries
- ✅ Multi-sentence streaming
- ✅ Total latency <3s (acceptable for complex queries)

---

### Test 5: Barge-In (Interruption)

**Goal:** Verify barge-in support still works

**Say:** "Hey Jarvis, tell me a really long story about—"
**Then interrupt:** "Never mind"

**Expected Behavior:**
- Bot stops current response
- Processes new query immediately

**Server Logs:**
```
Responding to YourName: "Hey Jarvis, tell me..."
First audio playing in 1.123s
Playing sentence #0...

🔥 Barge-in detected: YourName spoke during response
Pipeline cancelled for YourName
Speech started: YourName (123456789)

Transcribed (YourName): "Never mind" (latency: 0.298s)
Routed to haiku (confidence: 0.90)
```

**What This Tests:**
- ✅ Barge-in detection works with streaming
- ✅ Pipeline cancellation
- ✅ Immediate processing of new query

---

## Performance Monitoring

### Real-Time Stats

**In Discord, type:**
```
/status
```

**Expected Response:**
```
📊 Jarvis Voice Bot Status

🎯 Active Agent: Jarvis
🔊 Sensitivity: medium
👥 Active Users: 1
💬 Total Utterances: 12
🤖 Total Responses: 8
🚫 Cancellations: 1

⚡ Performance (Average):
├─ STT: 0.31s  ✅ (was ~1-2s)
├─ Routing: 0.01s  🆕
├─ Relevance: 0.11s
├─ LLM (first sentence): 0.38s  🆕
├─ TTS (first chunk): 0.29s  🆕
├─ Time to First Audio: 0.89s  ⭐ KEY METRIC!
└─ Total: 1.87s  ✅ (was ~4-11s)

🧠 Model Usage:
├─ Haiku: 67% (8 queries)  ← Fast responses
├─ Sonnet: 25% (3 queries)  ← Medium complexity
└─ Opus: 8% (1 query)  ← Deep reasoning

💾 TTS Cache:
├─ Size: 27 phrases
├─ Hits: 5 (42%)  ← 42% instant responses!
└─ Misses: 7 (58%)
```

**🎯 Target Metrics:**
- **Time to First Audio:** <1.5s (was 4-11s)
- **Total Latency:** <2.5s (was 4-11s)
- **STT:** <500ms (was 1-2s)
- **Cache Hit Rate:** 30-50% (higher over time)

### API Stats Endpoint

**From another terminal:**
```bash
curl http://localhost:8880/stats | python -m json.tool
```

**Response:**
```json
{
  "active_users": 1,
  "current_agent": "jarvis",
  "total_utterances": 12,
  "total_responses": 8,
  "avg_time_to_first_audio_latency": 0.893,  ⭐ <1s!
  "avg_llm_first_sentence_latency": 0.382,
  "avg_tts_first_chunk_latency": 0.294,
  "avg_stt_latency": 0.314,
  "avg_total_latency": 1.872,  ⭐ <2s!

  "router_stats": {
    "total_routes": 12,
    "routes_by_model": {
      "haiku": 8,
      "sonnet": 3,
      "opus": 1
    },
    "distribution": {
      "haiku": 0.667,
      "sonnet": 0.250,
      "opus": 0.083
    }
  }
}
```

---

## Optimization Verification Checklist

After running all 5 tests, verify:

- [ ] **STT is faster:** Latency ~300ms (was 1-2s)
- [ ] **Router is working:** See "Routed to haiku/sonnet/opus" in logs
- [ ] **Cache is hitting:** See "Cache hit" for common phrases
- [ ] **Streaming is working:** See "First sentence from LLM" and "First audio playing"
- [ ] **Time to first audio:** <1.5s average
- [ ] **Total latency:** <2.5s for most queries
- [ ] **Model distribution:** ~60-70% Haiku, ~20-30% Sonnet, ~10% Opus

---

## Troubleshooting

### Problem: No "TTS warmup complete" log

**Cause:** TTS synthesizer not calling warmup

**Fix:**
```bash
# Check run.py has warmup call
grep "warmup" run.py
```

Should see:
```python
await tts_synthesizer.warmup()
```

**Restart bot after confirming.**

---

### Problem: No "Routed to" logs

**Cause:** Router not integrated into orchestrator

**Fix:**
```bash
# Check orchestrator has router
grep "query_router" pipeline/orchestrator.py
```

**Verify orchestrator initialization includes router.**

---

### Problem: Still slow (>3s latency)

**Check each stage:**

1. **STT slow (>1s)?**
   - Verify `beam_size: 1` in config
   - Check GPU is being used: `nvidia-smi`

2. **LLM slow (>2s first sentence)?**
   - Check OpenClaw Gateway is responding
   - Verify model routing is working (should use Haiku for simple queries)
   - Test Gateway directly:
     ```bash
     curl http://192.168.50.9:18789/health
     ```

3. **TTS slow (>1s)?**
   - Check GPU utilization
   - Verify Chatterbox-Turbo is loaded (not Coqui)
   - Check cache is enabled in tts.py

4. **Cache not hitting?**
   - Check exact LLM responses in logs
   - Add common variations to `TTSSynthesizer.COMMON_PHRASES`

---

### Problem: Router always uses Sonnet

**Cause:** Queries don't match patterns

**Debug:**
```python
# Test router manually
from pipeline.query_router import QueryRouter

router = QueryRouter()
print(router.route("Hey Jarvis"))
# Should show: model='haiku', reason='matched_simple_pattern'
```

**Fix:** Add custom patterns to `pipeline/query_router.py`

---

### Problem: Cache hit rate is 0%

**Cause:** Phrase normalization mismatch

**Debug:** Check logs for exact LLM responses. Example:

```
LLM response: "Yes sir."  ← Missing comma!
Cache key: "yes, sir"     ← Has comma
```

**Fix:** Add variation to COMMON_PHRASES or update normalization.

---

## Expected Results Summary

| Test | Before | After | Improvement |
|------|--------|-------|-------------|
| **Simple (cached)** | 4-7s | 0.4-0.7s | **6-10x faster** ✅ |
| **Simple (uncached)** | 4-7s | 0.7-1.2s | **4-6x faster** ✅ |
| **Medium** | 5-9s | 1-2s | **3-5x faster** ✅ |
| **Complex** | 6-11s | 1.5-3s | **2-4x faster** ✅ |

**🎯 All queries should be under 2.5 seconds!**

---

## Next Steps

### If Everything Works:

1. **Test with multiple users** in voice channel
2. **Monitor cache hit rate** over time (should increase as common responses are cached)
3. **Tune router patterns** for your specific use cases
4. **Add more cached phrases** based on actual usage logs

### If You Want Even Faster (<1s):

See `OPTIMIZATION_SUMMARY.md` for Phase 2 options:
- Kani-TTS-2 evaluation (faster TTS engine)
- Full Pipecat integration (500-800ms target)

---

## Recording Your Results

Create a results log:

```bash
# Run test session
echo "=== Optimization Test Results ===" > test_results.txt
echo "Date: $(date)" >> test_results.txt
echo "" >> test_results.txt

# Test each scenario and record
echo "Simple Query (cached): Hey Jarvis" >> test_results.txt
# ... copy latency from logs

echo "Simple Query (uncached): Thank you" >> test_results.txt
# ... copy latency from logs

# etc.
```

**Share your results!** Compare before/after latencies to verify the 3-10x improvement.

---

*Testing the optimizations is the fun part — enjoy the speed boost!* 🚀