openclaw-voice/QUICK_START.md
MCKRUZ 9fde3d31ba feat: Major performance optimizations and feature enhancements
## Performance Optimizations (3-10x faster responses)
- STT beam_size reduced to 1 (3-5x faster transcription, minimal quality loss)
- Smart query routing: Haiku (simple) → Sonnet (medium) → Opus (complex)
- TTS cache for common phrases (27 pre-generated responses)
- Sentence-level streaming TTS (start playing while generating)
- Sample-based VAD timing (30x improvement in silence detection)

## TTS Engine Upgrade
- Migrated from Chatterbox to Chatterbox-Turbo
- Zero-shot voice cloning (no fine-tuning required)
- Native paralinguistic tag support ([laugh], [sigh], [chuckle], etc.)
- Emotion presets with temperature control
- Improved marker conversion (*action*, (action), ~action~)

## Discord Bot Enhancements
- Multi-agent support (Jarvis, Sage)
- Improved voice receiving with discord-ext-voice-recv
- Enhanced /join, /leave, /status commands
- Per-agent personality configuration
- Better audio sink/receiver implementation

## OpenClaw Integration
- WebSocket support for Gateway communication
- Query complexity routing (auto-select model)
- Improved error handling and retries
- Session management per Discord guild
- Better latency tracking

## Pipeline Improvements
- Sentence splitter for streaming optimization
- Query router for intelligent model selection
- Enhanced VAD receiver with sample-based timing
- Improved audio buffering and format conversion
- Better transcript management

## Documentation
- Added QUICK_START.md (5-minute test guide)
- Added OPTIMIZATION_SUMMARY.md (performance analysis)
- Added DISCORD_OPTIMIZATION_TEST.md (testing guide)
- Added USAGE_GUIDE.md (comprehensive usage)
- Updated README.md with optimization details

## Utilities & Scripts
- Added get_invite_link.py (Discord bot invite)
- Added sync_commands.py, sync_to_guild.py (command sync)
- Added test_gateway.py, test_stt.py (testing utilities)
- Added openclaw_wrapper.py (wrapper script)
- Removed create_mock_turn_model.py (no longer needed)

## Configuration Updates
- STT model: medium → small (faster, acceptable quality)
- TTS engine: chatterbox → coqui (Turbo integration)
- Beam size: 5 → 1 (latency optimization)
- Added emotion_exaggeration per agent
- Updated .gitignore for project files

Total: ~2105 insertions, ~462 deletions across 35 files
Performance: ~5.5s total latency (down from 22-35s)
Target: ~3.5s (achieved in simple queries with cache)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-16 19:29:57 -05:00

203 lines
3.6 KiB
Markdown

# Quick Start - Test Optimizations Now
**5-Minute Setup to Test 3-10x Faster Voice Chat**
---
## Step 1: Check Environment (30 seconds)
```cmd
# 1. Check .env exists
dir .env
# 2. Make sure it has these:
# DISCORD_TOKEN=...
# OPENCLAW_BASE_URL=ws://192.168.50.9:18789
# OPENCLAW_AUTH_TOKEN=...
```
**Missing .env?** Copy from example:
```cmd
copy .env.example .env
notepad .env
```
---
## Step 2: Start the Bot (1 minute)
```cmd
# Activate environment
activate.bat
# Start bot
python run.py
```
**Watch for:**
```
✓ TTS warmup complete (27 phrases cached) ← NEW!
Query router initialized (default: sonnet) ← NEW!
✓ Discord bot started
```
**If errors:** Check `DISCORD_OPTIMIZATION_TEST.md` troubleshooting section.
---
## Step 3: Join Voice in Discord (10 seconds)
In your Discord server:
```
/join
```
Should see:
```
✅ Joined voice channel
🎤 Listening for voice...
```
---
## Step 4: Test It! (2 minutes)
### Test 1: Simple Query (Should be INSTANT)
**Say:** "Hey Jarvis"
**Expected:** Response in ~500ms
**Log Check:**
```
Routed to haiku ✅
Cache hit for jarvis: 'Yes, sir.' ✅
First audio playing in 0.154s ✅ FAST!
```
---
### Test 2: Medium Query
**Say:** "What's on my calendar today?"
**Expected:** Response in ~1-2s
**Log Check:**
```
Routed to sonnet ✅
First sentence from LLM in 0.4s ✅
First audio playing in 0.9s ✅ <1 second!
```
---
### Test 3: Complex Query
**Say:** "Analyze the pros and cons of Pipecat"
**Expected:** Response in ~1.5-3s
**Log Check:**
```
Routed to opus ✅
First audio playing in 1.5s ✅ Still fast!
```
---
## Step 5: Check Stats (30 seconds)
In Discord:
```
/status
```
**Look for:**
```
⚡ Time to First Audio: 0.89s ⭐ (was 4-11s!)
💾 TTS Cache Hits: 42% ✅
🧠 Haiku: 67% ✅ (fast model being used!)
```
---
## Success Criteria
**Time to first audio:** <1.5s average (was 4-11s)
**Simple queries:** <1s (instant with cache)
**Medium queries:** 1-2s
**Complex queries:** <3s
**Cache hits:** 30%+ (increases over time)
**Haiku usage:** 60-70% (most queries are simple)
---
## Troubleshooting
**Bot won't start?**
```cmd
# Check logs
tail -f jarvis-bot.log
```
**No response?**
```cmd
# Check OpenClaw Gateway is running
curl http://192.168.50.9:18789/health
```
**Still slow?**
- Check `beam_size: 1` in config.yaml (line 123)
- Verify GPU is available: `nvidia-smi`
- See full guide: `DISCORD_OPTIMIZATION_TEST.md`
---
## Quick Reference
**Useful Commands:**
```
/join - Join voice
/leave - Leave voice
/status - Show performance stats
/agent jarvis - Switch to Jarvis
/agent sage - Switch to Sage
```
**Log Files:**
```
jarvis-bot.log - Main log
latency.log - Performance metrics (if enabled)
```
**Config Files:**
```
config.yaml - Main configuration
.env - Environment variables
server/voices/ - Voice reference files
```
---
## What You Just Tested
**STT Optimization** - beam_size: 1 (3-5x faster)
**Smart Model Router** - Haiku/Sonnet/Opus routing
**Streaming TTS** - Sentence-level playback
**TTS Cache** - 27 pre-generated phrases
**Total Improvement:** 3-10x faster voice responses!
---
## Next Steps
1. **Test with friends** - Multiple users in voice channel
2. **Monitor performance** - Use `/status` and `curl http://localhost:8880/stats`
3. **Tune for your use** - Add more cached phrases in `server/tts.py`
4. **Phase 2 optimization** - See `OPTIMIZATION_SUMMARY.md` for Kani-TTS-2 or Pipecat
---
*That's it! You're now running an optimized voice bot that's 3-10x faster!* 🚀