MCKRUZ 2f17d4847d docs: Add Kani-TTS-2 evaluation and RTX 5090 compatibility analysis

## Kani-TTS-2 Research
- Evaluated Kani-TTS-2 as potential TTS upgrade (3-4x faster, RTF 0.2)
- Documented benefits: zero-shot voice cloning, Apache 2.0 license, 3GB VRAM
- Identified Windows compatibility issues (pynini compilation failures)
- Created test script for future evaluation when Windows support improves

## RTX 5090 Critical Finding
- Discovered RTX 5090 (Blackwell sm_120) not supported by PyTorch
- Tested stable (2.6.0) and nightly (2.7.0.dev) - both lack sm_120 support
- Documented impact: GPU acceleration unavailable for STT/TTS
- Performance degradation: 3.5s target → 10-15s actual (CPU-only)

## Files Added
- KANI_TTS_EVALUATION.md - Comprehensive Kani-TTS-2 analysis
- RTX_5090_BLOCKER.md - GPU compatibility report with solutions
- test_kani_tts.py - Benchmark script for future testing
- fix_pytorch_cuda.bat - GPU setup script (for when support lands)

## Recommendations
- Wait 1-3 months for PyTorch sm_120 support
- Monitor PyTorch releases weekly
- Alternative: Cloud GPU (RTX 4090) or different local GPU
- Current: CPU-only mode functional but slow

## Next Steps
- Monitor: https://github.com/pytorch/pytorch/releases
- Test when available: pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
- Re-evaluate Kani-TTS-2 after GPU support

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-16 19:53:52 -05:00

6.4 KiB

Raw Blame History

RTX 5090 Compatibility Blocker

Date: February 16, 2026 GPU: NVIDIA GeForce RTX 5090 (32GB VRAM, Blackwell sm_120) Status: ❌ BLOCKED - No PyTorch Support

Critical Finding

The RTX 5090 is too new for current PyTorch builds. Both stable and nightly releases fail with:

RuntimeError: CUDA error: no kernel image is available for execution on the device

NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

Tested Versions:

❌ PyTorch 2.6.0+cu124 (Stable) - No sm_120 support
❌ PyTorch 2.7.0.dev20250310+cu124 (Nightly) - No sm_120 support

Impact on Your Voice Bot

Currently Affected

All GPU-accelerated components are non-functional:

Component	Current Status	Impact
faster-whisper STT	CPU-only	3-5x slower (550ms → ~2s)
Coqui XTTS v2 TTS	CPU-only	2-3x slower (1.6s → ~4-5s)
Kani-TTS-2 testing	Blocked	Cannot evaluate
Total latency	~10-15s	vs target 3.5s ❌

What Still Works

✅ Discord bot (voice receiving/sending)
✅ OpenClaw Gateway (LLM inference)
✅ VAD (Silero, CPU-based)
✅ Smart Turn v3 (ONNX, CPU-based)
⚠️ STT/TTS (fallback to CPU, very slow)

Solutions

Option 1: Wait for PyTorch Support (Recommended)

Timeline: 1-3 months (estimated)

Reason: RTX 5090 released Jan 2025, PyTorch typically adds new GPU support within 2-4 months.

Monitor:

Action:

Check weekly for PyTorch updates
Subscribe to PyTorch announcements
Test with: pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124

Option 2: Build PyTorch from Source (Advanced)

Difficulty: High Time: 4-8 hours Risk: May not work if CUDA Toolkit doesn't support sm_120

Steps:

Install CUDA Toolkit 12.8+ (if available with sm_120 support)

Clone PyTorch:

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch

Build with sm_120:

export TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;7.5;8.0;8.6;9.0;12.0"
python setup.py install

Test

Resources:

Building PyTorch from Source

Option 3: Use Different GPU

If available, use older GPU for development:

GPU	CUDA Capability	PyTorch Support	Recommendation
RTX 4090	sm_89	✅ Full support	✅ Ideal for development
RTX 4080	sm_89	✅ Full support	✅ Good alternative
RTX 4070 Ti	sm_89	✅ Full support	✅ Sufficient for voice bot
RTX 3090	sm_86	✅ Full support	✅ Works well

Action:

Check if you have access to RTX 40-series or 30-series GPU
Use for development until RTX 5090 support lands

Option 4: Run in Cloud with Supported GPU

Platforms:

RunPod - RTX 4090 @ $0.79/hr
Vast.ai - RTX 4090 @ $0.40-0.60/hr
Google Colab Pro - A100/V100 @ $10/month

Pros:

Immediate GPU access
Supported hardware
Test optimizations quickly

Cons:

Ongoing cost
Need to upload code/data
Network latency for Discord bot

Option 5: CPU-Only (Temporary Workaround)

Use case: Testing logic while waiting for GPU support

Current setup (already done):

pip install torch torchvision torchaudio  # CPU version

Performance:

STT: ~2-3s (vs 0.3s target)
TTS: ~4-5s (vs 0.9s target)
Total: ~10-15s (vs 3.5s target)

Acceptable for:

Testing conversation flow
Debugging bot logic
Development (not production)

Recommended Action Plan

Immediate (This Week)

✅ Rollback to CPU PyTorch for development:

pip install torch torchvision torchaudio

✅ Focus on non-GPU optimizations:
- Query routing (Haiku vs Sonnet vs Opus)
- TTS caching
- Sentence-level streaming
- Response filtering
✅ Test bot functionality with CPU (slow but works)

Short-term (Next 2-4 Weeks)

🔄 Monitor PyTorch releases for sm_120 support
🧪 Evaluate cloud GPU options:
- Test on RunPod/Vast.ai with RTX 4090
- Measure actual performance gains
- Compare cost vs waiting
📊 Benchmark CPU baseline to quantify GPU improvement later

Long-term (Next 1-3 Months)

⏳ Wait for PyTorch sm_120 support
🚀 Deploy with GPU when support lands
🔍 Re-evaluate Kani-TTS-2 once GPU works

Current Bot Configuration

For now, use CPU-only mode:

# config.yaml
pipeline:
  stt:
    model_size: "small"  # Smaller = faster on CPU
    device: "cpu"        # Force CPU
    beam_size: 1         # Faster decoding

  tts:
    device: "cpu"        # Force CPU

.env overrides:

PIPELINE__STT__DEVICE=cpu
PIPELINE__STT__MODEL_SIZE=small
PIPELINE__TTS__DEVICE=cpu

When PyTorch Supports sm_120

Test with:

# Uninstall current
pip uninstall torch torchaudio torchvision -y

# Install latest
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Verify
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"

# Test computation
python -c "import torch; x=torch.rand(100,100,device='cuda'); print('GPU OK')"

Then update config:

pipeline:
  stt:
    device: "cuda"
    model_size: "medium"  # Can use larger model on GPU
    beam_size: 5          # Better quality

  tts:
    device: "cuda"

Expected improvement:

STT: ~2s → ~0.35s (6x faster)
TTS: ~4-5s → ~0.9s (5x faster)
Total: ~10-15s → ~4s (3x faster, near 3.5s target!)

Resources

Summary: RTX 5090 support is coming, but not here yet. Use CPU mode for development now, monitor for PyTorch updates, or use cloud GPU for testing in the meantime.

6.4 KiB Raw Blame History