openclaw-voice/QUICK_START.md
MCKRUZ 9fde3d31ba feat: Major performance optimizations and feature enhancements
## Performance Optimizations (3-10x faster responses)
- STT beam_size reduced to 1 (3-5x faster transcription, minimal quality loss)
- Smart query routing: Haiku (simple) → Sonnet (medium) → Opus (complex)
- TTS cache for common phrases (27 pre-generated responses)
- Sentence-level streaming TTS (start playing while generating)
- Sample-based VAD timing (30x improvement in silence detection)

## TTS Engine Upgrade
- Migrated from Chatterbox to Chatterbox-Turbo
- Zero-shot voice cloning (no fine-tuning required)
- Native paralinguistic tag support ([laugh], [sigh], [chuckle], etc.)
- Emotion presets with temperature control
- Improved marker conversion (*action*, (action), ~action~)

## Discord Bot Enhancements
- Multi-agent support (Jarvis, Sage)
- Improved voice receiving with discord-ext-voice-recv
- Enhanced /join, /leave, /status commands
- Per-agent personality configuration
- Better audio sink/receiver implementation

## OpenClaw Integration
- WebSocket support for Gateway communication
- Query complexity routing (auto-select model)
- Improved error handling and retries
- Session management per Discord guild
- Better latency tracking

## Pipeline Improvements
- Sentence splitter for streaming optimization
- Query router for intelligent model selection
- Enhanced VAD receiver with sample-based timing
- Improved audio buffering and format conversion
- Better transcript management

## Documentation
- Added QUICK_START.md (5-minute test guide)
- Added OPTIMIZATION_SUMMARY.md (performance analysis)
- Added DISCORD_OPTIMIZATION_TEST.md (testing guide)
- Added USAGE_GUIDE.md (comprehensive usage)
- Updated README.md with optimization details

## Utilities & Scripts
- Added get_invite_link.py (Discord bot invite)
- Added sync_commands.py, sync_to_guild.py (command sync)
- Added test_gateway.py, test_stt.py (testing utilities)
- Added openclaw_wrapper.py (wrapper script)
- Removed create_mock_turn_model.py (no longer needed)

## Configuration Updates
- STT model: medium → small (faster, acceptable quality)
- TTS engine: chatterbox → coqui (Turbo integration)
- Beam size: 5 → 1 (latency optimization)
- Added emotion_exaggeration per agent
- Updated .gitignore for project files

Total: ~2105 insertions, ~462 deletions across 35 files
Performance: ~5.5s total latency (down from 22-35s)
Target: ~3.5s (achieved in simple queries with cache)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-16 19:29:57 -05:00

3.6 KiB

Quick Start - Test Optimizations Now

5-Minute Setup to Test 3-10x Faster Voice Chat


Step 1: Check Environment (30 seconds)

# 1. Check .env exists
dir .env

# 2. Make sure it has these:
# DISCORD_TOKEN=...
# OPENCLAW_BASE_URL=ws://192.168.50.9:18789
# OPENCLAW_AUTH_TOKEN=...

Missing .env? Copy from example:

copy .env.example .env
notepad .env

Step 2: Start the Bot (1 minute)

# Activate environment
activate.bat

# Start bot
python run.py

Watch for:

✓ TTS warmup complete (27 phrases cached)  ← NEW!
Query router initialized (default: sonnet)  ← NEW!
✓ Discord bot started

If errors: Check DISCORD_OPTIMIZATION_TEST.md troubleshooting section.


Step 3: Join Voice in Discord (10 seconds)

In your Discord server:

/join

Should see:

✅ Joined voice channel
🎤 Listening for voice...

Step 4: Test It! (2 minutes)

Test 1: Simple Query (Should be INSTANT)

Say: "Hey Jarvis"

Expected: Response in ~500ms

Log Check:

Routed to haiku  ✅
Cache hit for jarvis: 'Yes, sir.'  ✅
First audio playing in 0.154s  ✅ FAST!

Test 2: Medium Query

Say: "What's on my calendar today?"

Expected: Response in ~1-2s

Log Check:

Routed to sonnet  ✅
First sentence from LLM in 0.4s  ✅
First audio playing in 0.9s  ✅ <1 second!

Test 3: Complex Query

Say: "Analyze the pros and cons of Pipecat"

Expected: Response in ~1.5-3s

Log Check:

Routed to opus  ✅
First audio playing in 1.5s  ✅ Still fast!

Step 5: Check Stats (30 seconds)

In Discord:

/status

Look for:

⚡ Time to First Audio: 0.89s  ⭐ (was 4-11s!)
💾 TTS Cache Hits: 42%  ✅
🧠 Haiku: 67%  ✅ (fast model being used!)

Success Criteria

Time to first audio: <1.5s average (was 4-11s) Simple queries: <1s (instant with cache) Medium queries: 1-2s Complex queries: <3s Cache hits: 30%+ (increases over time) Haiku usage: 60-70% (most queries are simple)


Troubleshooting

Bot won't start?

# Check logs
tail -f jarvis-bot.log

No response?

# Check OpenClaw Gateway is running
curl http://192.168.50.9:18789/health

Still slow?

  • Check beam_size: 1 in config.yaml (line 123)
  • Verify GPU is available: nvidia-smi
  • See full guide: DISCORD_OPTIMIZATION_TEST.md

Quick Reference

Useful Commands:

/join          - Join voice
/leave         - Leave voice
/status        - Show performance stats
/agent jarvis  - Switch to Jarvis
/agent sage    - Switch to Sage

Log Files:

jarvis-bot.log           - Main log
latency.log              - Performance metrics (if enabled)

Config Files:

config.yaml              - Main configuration
.env                     - Environment variables
server/voices/           - Voice reference files

What You Just Tested

STT Optimization - beam_size: 1 (3-5x faster) Smart Model Router - Haiku/Sonnet/Opus routing Streaming TTS - Sentence-level playback TTS Cache - 27 pre-generated phrases

Total Improvement: 3-10x faster voice responses!


Next Steps

  1. Test with friends - Multiple users in voice channel
  2. Monitor performance - Use /status and curl http://localhost:8880/stats
  3. Tune for your use - Add more cached phrases in server/tts.py
  4. Phase 2 optimization - See OPTIMIZATION_SUMMARY.md for Kani-TTS-2 or Pipecat

That's it! You're now running an optimized voice bot that's 3-10x faster! 🚀