# Hermes Agent Debugging Guide This guide helps diagnose why Hermes Agent may not be running after Terraform deployment. ## Quick Diagnostic Checklist ### 1. Service Status ```bash # Check systemd service status systemctl status hermes.service # View service logs journalctl -u hermes.service -f # Check if container exists docker ps -a | grep hermes # View container logs docker logs hermes ``` ### 2. Docker Health ```bash # Verify Docker is running systemctl status docker # List containers docker ps -a # Check Docker events (watch real-time) docker events # Check docker socket permissions ls -la /var/run/docker.sock ``` ### 3. Directory and File Permissions ```bash # Check .hermes directory ls -la ~/.hermes/ ls -la ~/.hermes/.env ls -la ~/docker-compose.yml # Check file contents cat ~/.hermes/.env cat ~/.hermes/config.yaml cat ~/docker-compose.yml ``` ## Common Issues and Fixes ### Issue 1: "Hermes container not running" **Symptoms:** - `docker ps` shows no hermes container - `.hermes` folder exists but docker container won't start **Diagnosis:** ```bash # Check service status systemctl status hermes.service # Check recent logs journalctl -u hermes.service -n 50 # Check docker logs more verbosely docker logs hermes 2>&1 | tail -50 ``` **Root Causes:** 1. **Docker image not pulled properly** → Pull manually: ```bash docker pull nousresearch/hermes-agent:latest ``` 2. **Missing .env file** → Check if it exists and has content: ```bash ls -la ~/.hermes/.env cat ~/.hermes/.env ``` 3. **Directory permission issues** → Fix permissions: ```bash sudo chown -R $(whoami):$(whoami) ~/.hermes chmod 755 ~/.hermes chmod 600 ~/.hermes/.env ``` 4. **Docker compose file not found** → Verify location: ```bash ls -la ~/docker-compose.yml cat ~/docker-compose.yml ``` 5. **Port 18789 already in use** → Check: ```bash lsof -i :18789 ``` If occupied, either: - Kill the process using it - Change the port in docker-compose.yml ### Issue 2: "Container starts but immediately exits" **Symptoms:** - `docker ps` is empty but `docker ps -a` shows the container with "Exited" status - Container stops within seconds of starting **Diagnosis:** ```bash # View the exit code docker ps -a | grep hermes # Get more detailed error logs docker logs hermes ``` **Common Fixes:** 1. **Invalid YAML in config.yaml** → Validate syntax: ```bash python3 -c "import yaml; yaml.safe_load(open('~/.hermes/config.yaml'))" ``` 2. **Missing API keys** → Check: ```bash grep -E "OPENROUTER|DISCORD_BOT|BRAVE" ~/.hermes/.env ``` 3. **Invalid gateway token** → Verify: ```bash echo $HERMES_GATEWAY_TOKEN ``` ### Issue 3: "Docker daemon won't start" **Symptoms:** - `systemctl status docker` shows failed/inactive - `docker ps` returns "Cannot connect to Docker daemon" **Fixes:** ```bash # Start Docker sudo systemctl start docker # Enable on boot sudo systemctl enable docker # Check Docker health docker ps ``` ### Issue 4: "Discord bot shows offline" **Symptoms:** - Hermes is running (docker ps shows container) - But Discord bot doesn't show "online" status in your server **Diagnosis:** ```bash # Check if Discord configuration is loaded grep -i discord ~/.hermes/.env grep -i discord ~/.hermes/config.yaml # View container logs for Discord errors docker logs hermes | grep -i discord ``` **Root Causes:** 1. **Invalid bot token** → Verify in .env: ```bash grep DISCORD_BOT_TOKEN ~/.hermes/.env ``` 2. **Wrong server ID** → Check config: ```bash grep -A 5 "discord_server_id" ~/.hermes/config.yaml ``` 3. **User IDs not in server** → Verify in allowlist: ```bash grep -A 10 "users:" ~/.hermes/config.yaml ``` 4. **Gateway not running** → Check port: ```bash lsof -i :18789 ``` 5. **Bot not in server** → Manual fix: 1. Go to Discord Developer Portal 2. Select your bot 3. Copy OAuth2 URL with scopes: `bot`, `applications.commands` 4. Click the URL to invite bot to your server ### Issue 5: "Container gets killed after startup" **Symptoms:** - Service shows active but container keeps restarting - `docker logs` shows memory or resource errors **Fixes:** ```bash # Check Docker stats docker stats hermes # Check docker-compose.yml resource limits grep -A 5 "deploy:" ~/docker-compose.yml # Increase memory limit if needed # Edit ~/docker-compose.yml and increase memory value nano ~/docker-compose.yml ``` ## Verification Steps Once you believe Hermes is running, verify with: ```bash # Health check script (if it exists) bash /usr/local/bin/hermes-health-check.sh # Manual health checks echo "1. Service status:" systemctl is-active hermes.service echo "2. Container running:" docker ps | grep hermes echo "3. Port listening:" netstat -tlnp | grep 18789 ``` ## Manual Start/Stop If the systemd service isn't working: ```bash # Manual start cd ~/ docker compose -f docker-compose.yml up -d # Manual stop cd ~/ docker compose -f docker-compose.yml down # Manual logs cd ~/ docker compose -f docker-compose.yml logs -f ``` ## Rebuilding from Scratch If nothing else works: ```bash # Stop everything systemctl stop hermes.service docker compose -f ~/docker-compose.yml down # Remove container and image docker rm hermes 2>/dev/null || true docker rmi nousresearch/hermes-agent:latest 2>/dev/null || true # Pull fresh image docker pull nousresearch/hermes-agent:latest # Start service again systemctl start hermes.service # Monitor startup journalctl -u hermes.service -f ``` ## Debug Mode For more verbose logging: ```bash # Watch service logs with timestamps journalctl -u hermes.service -f --all # Watch docker logs continuously docker logs -f --tail=50 hermes # Run docker compose in foreground (stops automated service) cd ~/ docker compose -f docker-compose.yml up ``` ## Testing Discord Connectivity Once Hermes is running: ```bash # Send a test message to your Discord bot # The bot should respond in the channel or via DM # Check if bot is responding to mentions @hermes help # Or check logs for Discord activity docker logs hermes | tail -100 ``` ## Terraform Logs Check cloud-init logs on the server for deployment issues: ```bash # View cloud-init output sudo cloud-init status sudo cat /var/log/cloud-init-output.log # Check for specific errors grep -i error /var/log/cloud-init-output.log grep -i docker /var/log/cloud-init.log ``` ## Getting Help If stuck, provide: 1. Output of `systemctl status hermes.service` 2. Output of `docker ps -a` 3. Last 50 lines of `docker logs hermes` 4. Contents of `~/.hermes/.env` (redact secrets) 5. Contents of `~/.hermes/config.yaml` 6. Output of `cloud-init status`