openclaw-voice/RTX_5090_BLOCKER.md
MCKRUZ 2f17d4847d docs: Add Kani-TTS-2 evaluation and RTX 5090 compatibility analysis
## Kani-TTS-2 Research
- Evaluated Kani-TTS-2 as potential TTS upgrade (3-4x faster, RTF 0.2)
- Documented benefits: zero-shot voice cloning, Apache 2.0 license, 3GB VRAM
- Identified Windows compatibility issues (pynini compilation failures)
- Created test script for future evaluation when Windows support improves

## RTX 5090 Critical Finding
- Discovered RTX 5090 (Blackwell sm_120) not supported by PyTorch
- Tested stable (2.6.0) and nightly (2.7.0.dev) - both lack sm_120 support
- Documented impact: GPU acceleration unavailable for STT/TTS
- Performance degradation: 3.5s target → 10-15s actual (CPU-only)

## Files Added
- KANI_TTS_EVALUATION.md - Comprehensive Kani-TTS-2 analysis
- RTX_5090_BLOCKER.md - GPU compatibility report with solutions
- test_kani_tts.py - Benchmark script for future testing
- fix_pytorch_cuda.bat - GPU setup script (for when support lands)

## Recommendations
- Wait 1-3 months for PyTorch sm_120 support
- Monitor PyTorch releases weekly
- Alternative: Cloud GPU (RTX 4090) or different local GPU
- Current: CPU-only mode functional but slow

## Next Steps
- Monitor: https://github.com/pytorch/pytorch/releases
- Test when available: pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
- Re-evaluate Kani-TTS-2 after GPU support

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-16 19:53:52 -05:00

6.4 KiB

RTX 5090 Compatibility Blocker

Date: February 16, 2026 GPU: NVIDIA GeForce RTX 5090 (32GB VRAM, Blackwell sm_120) Status: BLOCKED - No PyTorch Support


Critical Finding

The RTX 5090 is too new for current PyTorch builds. Both stable and nightly releases fail with:

RuntimeError: CUDA error: no kernel image is available for execution on the device

NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

Tested Versions:

  • PyTorch 2.6.0+cu124 (Stable) - No sm_120 support
  • PyTorch 2.7.0.dev20250310+cu124 (Nightly) - No sm_120 support

Impact on Your Voice Bot

Currently Affected

All GPU-accelerated components are non-functional:

Component Current Status Impact
faster-whisper STT CPU-only 3-5x slower (550ms → ~2s)
Coqui XTTS v2 TTS CPU-only 2-3x slower (1.6s → ~4-5s)
Kani-TTS-2 testing Blocked Cannot evaluate
Total latency ~10-15s vs target 3.5s

What Still Works

  • Discord bot (voice receiving/sending)
  • OpenClaw Gateway (LLM inference)
  • VAD (Silero, CPU-based)
  • Smart Turn v3 (ONNX, CPU-based)
  • ⚠️ STT/TTS (fallback to CPU, very slow)

Solutions

Timeline: 1-3 months (estimated)

Reason: RTX 5090 released Jan 2025, PyTorch typically adds new GPU support within 2-4 months.

Monitor:

Action:

  • Check weekly for PyTorch updates
  • Subscribe to PyTorch announcements
  • Test with: pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124

Option 2: Build PyTorch from Source (Advanced)

Difficulty: High Time: 4-8 hours Risk: May not work if CUDA Toolkit doesn't support sm_120

Steps:

  1. Install CUDA Toolkit 12.8+ (if available with sm_120 support)
  2. Clone PyTorch:
    git clone --recursive https://github.com/pytorch/pytorch
    cd pytorch
    
  3. Build with sm_120:
    export TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;7.5;8.0;8.6;9.0;12.0"
    python setup.py install
    
  4. Test

Resources:

Option 3: Use Different GPU

If available, use older GPU for development:

GPU CUDA Capability PyTorch Support Recommendation
RTX 4090 sm_89 Full support Ideal for development
RTX 4080 sm_89 Full support Good alternative
RTX 4070 Ti sm_89 Full support Sufficient for voice bot
RTX 3090 sm_86 Full support Works well

Action:

  • Check if you have access to RTX 40-series or 30-series GPU
  • Use for development until RTX 5090 support lands

Option 4: Run in Cloud with Supported GPU

Platforms:

  • RunPod - RTX 4090 @ $0.79/hr
  • Vast.ai - RTX 4090 @ $0.40-0.60/hr
  • Google Colab Pro - A100/V100 @ $10/month

Pros:

  • Immediate GPU access
  • Supported hardware
  • Test optimizations quickly

Cons:

  • Ongoing cost
  • Need to upload code/data
  • Network latency for Discord bot

Option 5: CPU-Only (Temporary Workaround)

Use case: Testing logic while waiting for GPU support

Current setup (already done):

pip install torch torchvision torchaudio  # CPU version

Performance:

  • STT: ~2-3s (vs 0.3s target)
  • TTS: ~4-5s (vs 0.9s target)
  • Total: ~10-15s (vs 3.5s target)

Acceptable for:

  • Testing conversation flow
  • Debugging bot logic
  • Development (not production)

Immediate (This Week)

  1. Rollback to CPU PyTorch for development:

    pip install torch torchvision torchaudio
    
  2. Focus on non-GPU optimizations:

    • Query routing (Haiku vs Sonnet vs Opus)
    • TTS caching
    • Sentence-level streaming
    • Response filtering
  3. Test bot functionality with CPU (slow but works)

Short-term (Next 2-4 Weeks)

  1. 🔄 Monitor PyTorch releases for sm_120 support

  2. 🧪 Evaluate cloud GPU options:

    • Test on RunPod/Vast.ai with RTX 4090
    • Measure actual performance gains
    • Compare cost vs waiting
  3. 📊 Benchmark CPU baseline to quantify GPU improvement later

Long-term (Next 1-3 Months)

  1. Wait for PyTorch sm_120 support

  2. 🚀 Deploy with GPU when support lands

  3. 🔍 Re-evaluate Kani-TTS-2 once GPU works


Current Bot Configuration

For now, use CPU-only mode:

# config.yaml
pipeline:
  stt:
    model_size: "small"  # Smaller = faster on CPU
    device: "cpu"        # Force CPU
    beam_size: 1         # Faster decoding

  tts:
    device: "cpu"        # Force CPU

.env overrides:

PIPELINE__STT__DEVICE=cpu
PIPELINE__STT__MODEL_SIZE=small
PIPELINE__TTS__DEVICE=cpu

When PyTorch Supports sm_120

Test with:

# Uninstall current
pip uninstall torch torchaudio torchvision -y

# Install latest
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Verify
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"

# Test computation
python -c "import torch; x=torch.rand(100,100,device='cuda'); print('GPU OK')"

Then update config:

pipeline:
  stt:
    device: "cuda"
    model_size: "medium"  # Can use larger model on GPU
    beam_size: 5          # Better quality

  tts:
    device: "cuda"

Expected improvement:

  • STT: ~2s → ~0.35s (6x faster)
  • TTS: ~4-5s → ~0.9s (5x faster)
  • Total: ~10-15s → ~4s (3x faster, near 3.5s target!)

Resources


Summary: RTX 5090 support is coming, but not here yet. Use CPU mode for development now, monitor for PyTorch updates, or use cloud GPU for testing in the meantime.