openclaw-voice/requirements.txt
MCKRUZ 9fde3d31ba feat: Major performance optimizations and feature enhancements
## Performance Optimizations (3-10x faster responses)
- STT beam_size reduced to 1 (3-5x faster transcription, minimal quality loss)
- Smart query routing: Haiku (simple) → Sonnet (medium) → Opus (complex)
- TTS cache for common phrases (27 pre-generated responses)
- Sentence-level streaming TTS (start playing while generating)
- Sample-based VAD timing (30x improvement in silence detection)

## TTS Engine Upgrade
- Migrated from Chatterbox to Chatterbox-Turbo
- Zero-shot voice cloning (no fine-tuning required)
- Native paralinguistic tag support ([laugh], [sigh], [chuckle], etc.)
- Emotion presets with temperature control
- Improved marker conversion (*action*, (action), ~action~)

## Discord Bot Enhancements
- Multi-agent support (Jarvis, Sage)
- Improved voice receiving with discord-ext-voice-recv
- Enhanced /join, /leave, /status commands
- Per-agent personality configuration
- Better audio sink/receiver implementation

## OpenClaw Integration
- WebSocket support for Gateway communication
- Query complexity routing (auto-select model)
- Improved error handling and retries
- Session management per Discord guild
- Better latency tracking

## Pipeline Improvements
- Sentence splitter for streaming optimization
- Query router for intelligent model selection
- Enhanced VAD receiver with sample-based timing
- Improved audio buffering and format conversion
- Better transcript management

## Documentation
- Added QUICK_START.md (5-minute test guide)
- Added OPTIMIZATION_SUMMARY.md (performance analysis)
- Added DISCORD_OPTIMIZATION_TEST.md (testing guide)
- Added USAGE_GUIDE.md (comprehensive usage)
- Updated README.md with optimization details

## Utilities & Scripts
- Added get_invite_link.py (Discord bot invite)
- Added sync_commands.py, sync_to_guild.py (command sync)
- Added test_gateway.py, test_stt.py (testing utilities)
- Added openclaw_wrapper.py (wrapper script)
- Removed create_mock_turn_model.py (no longer needed)

## Configuration Updates
- STT model: medium → small (faster, acceptable quality)
- TTS engine: chatterbox → coqui (Turbo integration)
- Beam size: 5 → 1 (latency optimization)
- Added emotion_exaggeration per agent
- Updated .gitignore for project files

Total: ~2105 insertions, ~462 deletions across 35 files
Performance: ~5.5s total latency (down from 22-35s)
Target: ~3.5s (achieved in simple queries with cache)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-16 19:29:57 -05:00

77 lines
3 KiB
Text

# Jarvis Voice Bot - Python Dependencies
# Python 3.12+ required
# ============================================================================
# Discord Integration
# ============================================================================
discord.py[voice]>=2.3.2
PyNaCl>=1.5.0 # Voice support for discord.py
# ============================================================================
# Audio Processing
# ============================================================================
numpy>=1.24.0
soundfile>=0.12.1
scipy>=1.11.0
librosa>=0.10.1
opuslib>=3.0.1 # Opus codec for Discord audio
resampy>=0.4.2 # High-quality audio resampling
# ============================================================================
# Machine Learning - Speech & Audio
# ============================================================================
torch>=2.1.0
torchaudio>=2.1.0
faster-whisper>=1.0.0 # GPU-accelerated STT
silero-vad>=4.0.0 # Voice activity detection
onnxruntime>=1.16.0 # Smart Turn model inference
# ============================================================================
# Text-to-Speech
# ============================================================================
# Note: Chatterbox TTS needs verification - may need alternative
# Alternatives: coqui-tts (XTTS v2), piper-tts, StyleTTS2
TTS>=0.22.0 # Coqui TTS (fallback option)
# ============================================================================
# API Server
# ============================================================================
fastapi>=0.104.0
uvicorn[standard]>=0.24.0
python-multipart>=0.0.6 # File upload support
aiofiles>=23.2.0 # Async file operations
# ============================================================================
# HTTP Clients & WebSocket
# ============================================================================
httpx>=0.25.0 # Async HTTP client for OpenClaw API
aiohttp>=3.9.0 # Alternative async HTTP
websockets>=12.0 # WebSocket client for OpenClaw Gateway
# ============================================================================
# Configuration & Environment
# ============================================================================
pyyaml>=6.0.1
python-dotenv>=1.0.0
pydantic>=2.5.0 # Type-safe configuration
# ============================================================================
# Utilities
# ============================================================================
python-dateutil>=2.8.2
tenacity>=8.2.3 # Retry logic
# ============================================================================
# Development & Testing
# ============================================================================
pytest>=7.4.0
pytest-asyncio>=0.21.0
pytest-cov>=4.1.0
httpx>=0.25.0 # Required for TestClient (already listed above)
black>=23.11.0 # Code formatting
ruff>=0.1.6 # Linting
# ============================================================================
# Windows-Specific (Optional)
# ============================================================================
# pywin32>=306 # Windows API access if needed