## Performance Optimizations (3-10x faster responses) - STT beam_size reduced to 1 (3-5x faster transcription, minimal quality loss) - Smart query routing: Haiku (simple) → Sonnet (medium) → Opus (complex) - TTS cache for common phrases (27 pre-generated responses) - Sentence-level streaming TTS (start playing while generating) - Sample-based VAD timing (30x improvement in silence detection) ## TTS Engine Upgrade - Migrated from Chatterbox to Chatterbox-Turbo - Zero-shot voice cloning (no fine-tuning required) - Native paralinguistic tag support ([laugh], [sigh], [chuckle], etc.) - Emotion presets with temperature control - Improved marker conversion (*action*, (action), ~action~) ## Discord Bot Enhancements - Multi-agent support (Jarvis, Sage) - Improved voice receiving with discord-ext-voice-recv - Enhanced /join, /leave, /status commands - Per-agent personality configuration - Better audio sink/receiver implementation ## OpenClaw Integration - WebSocket support for Gateway communication - Query complexity routing (auto-select model) - Improved error handling and retries - Session management per Discord guild - Better latency tracking ## Pipeline Improvements - Sentence splitter for streaming optimization - Query router for intelligent model selection - Enhanced VAD receiver with sample-based timing - Improved audio buffering and format conversion - Better transcript management ## Documentation - Added QUICK_START.md (5-minute test guide) - Added OPTIMIZATION_SUMMARY.md (performance analysis) - Added DISCORD_OPTIMIZATION_TEST.md (testing guide) - Added USAGE_GUIDE.md (comprehensive usage) - Updated README.md with optimization details ## Utilities & Scripts - Added get_invite_link.py (Discord bot invite) - Added sync_commands.py, sync_to_guild.py (command sync) - Added test_gateway.py, test_stt.py (testing utilities) - Added openclaw_wrapper.py (wrapper script) - Removed create_mock_turn_model.py (no longer needed) ## Configuration Updates - STT model: medium → small (faster, acceptable quality) - TTS engine: chatterbox → coqui (Turbo integration) - Beam size: 5 → 1 (latency optimization) - Added emotion_exaggeration per agent - Updated .gitignore for project files Total: ~2105 insertions, ~462 deletions across 35 files Performance: ~5.5s total latency (down from 22-35s) Target: ~3.5s (achieved in simple queries with cache) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
77 lines
3 KiB
Text
77 lines
3 KiB
Text
# Jarvis Voice Bot - Python Dependencies
|
|
# Python 3.12+ required
|
|
|
|
# ============================================================================
|
|
# Discord Integration
|
|
# ============================================================================
|
|
discord.py[voice]>=2.3.2
|
|
PyNaCl>=1.5.0 # Voice support for discord.py
|
|
|
|
# ============================================================================
|
|
# Audio Processing
|
|
# ============================================================================
|
|
numpy>=1.24.0
|
|
soundfile>=0.12.1
|
|
scipy>=1.11.0
|
|
librosa>=0.10.1
|
|
opuslib>=3.0.1 # Opus codec for Discord audio
|
|
resampy>=0.4.2 # High-quality audio resampling
|
|
|
|
# ============================================================================
|
|
# Machine Learning - Speech & Audio
|
|
# ============================================================================
|
|
torch>=2.1.0
|
|
torchaudio>=2.1.0
|
|
faster-whisper>=1.0.0 # GPU-accelerated STT
|
|
silero-vad>=4.0.0 # Voice activity detection
|
|
onnxruntime>=1.16.0 # Smart Turn model inference
|
|
|
|
# ============================================================================
|
|
# Text-to-Speech
|
|
# ============================================================================
|
|
# Note: Chatterbox TTS needs verification - may need alternative
|
|
# Alternatives: coqui-tts (XTTS v2), piper-tts, StyleTTS2
|
|
TTS>=0.22.0 # Coqui TTS (fallback option)
|
|
|
|
# ============================================================================
|
|
# API Server
|
|
# ============================================================================
|
|
fastapi>=0.104.0
|
|
uvicorn[standard]>=0.24.0
|
|
python-multipart>=0.0.6 # File upload support
|
|
aiofiles>=23.2.0 # Async file operations
|
|
|
|
# ============================================================================
|
|
# HTTP Clients & WebSocket
|
|
# ============================================================================
|
|
httpx>=0.25.0 # Async HTTP client for OpenClaw API
|
|
aiohttp>=3.9.0 # Alternative async HTTP
|
|
websockets>=12.0 # WebSocket client for OpenClaw Gateway
|
|
|
|
# ============================================================================
|
|
# Configuration & Environment
|
|
# ============================================================================
|
|
pyyaml>=6.0.1
|
|
python-dotenv>=1.0.0
|
|
pydantic>=2.5.0 # Type-safe configuration
|
|
|
|
# ============================================================================
|
|
# Utilities
|
|
# ============================================================================
|
|
python-dateutil>=2.8.2
|
|
tenacity>=8.2.3 # Retry logic
|
|
|
|
# ============================================================================
|
|
# Development & Testing
|
|
# ============================================================================
|
|
pytest>=7.4.0
|
|
pytest-asyncio>=0.21.0
|
|
pytest-cov>=4.1.0
|
|
httpx>=0.25.0 # Required for TestClient (already listed above)
|
|
black>=23.11.0 # Code formatting
|
|
ruff>=0.1.6 # Linting
|
|
|
|
# ============================================================================
|
|
# Windows-Specific (Optional)
|
|
# ============================================================================
|
|
# pywin32>=306 # Windows API access if needed
|