openboatmobile-ai/hermes/docs/DEBUGGING.md
Mermaid Man ea73745147 refactor: restructure into hermes/ and openclaw/ directories
- Split cloudinit.tf into cloudinit-hermes.tf and cloudinit-openclaw.tf
- Split variables.tf into variables-common.tf, variables-hermes.tf, variables-openclaw.tf
- Move templates into hermes/templates/ and openclaw/templates/
- Move models/ into openclaw/models/
- Move hermes-openclaw.json to openclaw/openclaw-reference.json
- Move hermes docs to hermes/docs/
- OpenClaw cloudinit now uses variables instead of hardcoded values
- All 48 variable references verified against definitions
2026-04-24 19:45:03 +00:00

330 lines
6.6 KiB
Markdown

# Hermes Agent Debugging Guide
This guide helps diagnose why Hermes Agent may not be running after Terraform deployment.
## Quick Diagnostic Checklist
### 1. Service Status
```bash
# Check systemd service status
systemctl status hermes.service
# View service logs
journalctl -u hermes.service -f
# Check if container exists
docker ps -a | grep hermes
# View container logs
docker logs hermes
```
### 2. Docker Health
```bash
# Verify Docker is running
systemctl status docker
# List containers
docker ps -a
# Check Docker events (watch real-time)
docker events
# Check docker socket permissions
ls -la /var/run/docker.sock
```
### 3. Directory and File Permissions
```bash
# Check .hermes directory
ls -la ~/.hermes/
ls -la ~/.hermes/.env
ls -la ~/docker-compose.yml
# Check file contents
cat ~/.hermes/.env
cat ~/.hermes/config.yaml
cat ~/docker-compose.yml
```
## Common Issues and Fixes
### Issue 1: "Hermes container not running"
**Symptoms:**
- `docker ps` shows no hermes container
- `.hermes` folder exists but docker container won't start
**Diagnosis:**
```bash
# Check service status
systemctl status hermes.service
# Check recent logs
journalctl -u hermes.service -n 50
# Check docker logs more verbosely
docker logs hermes 2>&1 | tail -50
```
**Root Causes:**
1. **Docker image not pulled properly** → Pull manually:
```bash
docker pull nousresearch/hermes-agent:latest
```
2. **Missing .env file** → Check if it exists and has content:
```bash
ls -la ~/.hermes/.env
cat ~/.hermes/.env
```
3. **Directory permission issues** → Fix permissions:
```bash
sudo chown -R $(whoami):$(whoami) ~/.hermes
chmod 755 ~/.hermes
chmod 600 ~/.hermes/.env
```
4. **Docker compose file not found** → Verify location:
```bash
ls -la ~/docker-compose.yml
cat ~/docker-compose.yml
```
5. **Port 18789 already in use** → Check:
```bash
lsof -i :18789
```
If occupied, either:
- Kill the process using it
- Change the port in docker-compose.yml
### Issue 2: "Container starts but immediately exits"
**Symptoms:**
- `docker ps` is empty but `docker ps -a` shows the container with "Exited" status
- Container stops within seconds of starting
**Diagnosis:**
```bash
# View the exit code
docker ps -a | grep hermes
# Get more detailed error logs
docker logs hermes
```
**Common Fixes:**
1. **Invalid YAML in config.yaml** → Validate syntax:
```bash
python3 -c "import yaml; yaml.safe_load(open('~/.hermes/config.yaml'))"
```
2. **Missing API keys** → Check:
```bash
grep -E "OPENROUTER|DISCORD_BOT|BRAVE" ~/.hermes/.env
```
3. **Invalid gateway token** → Verify:
```bash
echo $HERMES_GATEWAY_TOKEN
```
### Issue 3: "Docker daemon won't start"
**Symptoms:**
- `systemctl status docker` shows failed/inactive
- `docker ps` returns "Cannot connect to Docker daemon"
**Fixes:**
```bash
# Start Docker
sudo systemctl start docker
# Enable on boot
sudo systemctl enable docker
# Check Docker health
docker ps
```
### Issue 4: "Discord bot shows offline"
**Symptoms:**
- Hermes is running (docker ps shows container)
- But Discord bot doesn't show "online" status in your server
**Diagnosis:**
```bash
# Check if Discord configuration is loaded
grep -i discord ~/.hermes/.env
grep -i discord ~/.hermes/config.yaml
# View container logs for Discord errors
docker logs hermes | grep -i discord
```
**Root Causes:**
1. **Invalid bot token** → Verify in .env:
```bash
grep DISCORD_BOT_TOKEN ~/.hermes/.env
```
2. **Wrong server ID** → Check config:
```bash
grep -A 5 "discord_server_id" ~/.hermes/config.yaml
```
3. **User IDs not in server** → Verify in allowlist:
```bash
grep -A 10 "users:" ~/.hermes/config.yaml
```
4. **Gateway not running** → Check port:
```bash
lsof -i :18789
```
5. **Bot not in server** → Manual fix:
1. Go to Discord Developer Portal
2. Select your bot
3. Copy OAuth2 URL with scopes: `bot`, `applications.commands`
4. Click the URL to invite bot to your server
### Issue 5: "Container gets killed after startup"
**Symptoms:**
- Service shows active but container keeps restarting
- `docker logs` shows memory or resource errors
**Fixes:**
```bash
# Check Docker stats
docker stats hermes
# Check docker-compose.yml resource limits
grep -A 5 "deploy:" ~/docker-compose.yml
# Increase memory limit if needed
# Edit ~/docker-compose.yml and increase memory value
nano ~/docker-compose.yml
```
## Verification Steps
Once you believe Hermes is running, verify with:
```bash
# Health check script (if it exists)
bash /usr/local/bin/hermes-health-check.sh
# Manual health checks
echo "1. Service status:"
systemctl is-active hermes.service
echo "2. Container running:"
docker ps | grep hermes
echo "3. Port listening:"
netstat -tlnp | grep 18789
```
## Manual Start/Stop
If the systemd service isn't working:
```bash
# Manual start
cd ~/
docker compose -f docker-compose.yml up -d
# Manual stop
cd ~/
docker compose -f docker-compose.yml down
# Manual logs
cd ~/
docker compose -f docker-compose.yml logs -f
```
## Rebuilding from Scratch
If nothing else works:
```bash
# Stop everything
systemctl stop hermes.service
docker compose -f ~/docker-compose.yml down
# Remove container and image
docker rm hermes 2>/dev/null || true
docker rmi nousresearch/hermes-agent:latest 2>/dev/null || true
# Pull fresh image
docker pull nousresearch/hermes-agent:latest
# Start service again
systemctl start hermes.service
# Monitor startup
journalctl -u hermes.service -f
```
## Debug Mode
For more verbose logging:
```bash
# Watch service logs with timestamps
journalctl -u hermes.service -f --all
# Watch docker logs continuously
docker logs -f --tail=50 hermes
# Run docker compose in foreground (stops automated service)
cd ~/
docker compose -f docker-compose.yml up
```
## Testing Discord Connectivity
Once Hermes is running:
```bash
# Send a test message to your Discord bot
# The bot should respond in the channel or via DM
# Check if bot is responding to mentions
@hermes help
# Or check logs for Discord activity
docker logs hermes | tail -100
```
## Terraform Logs
Check cloud-init logs on the server for deployment issues:
```bash
# View cloud-init output
sudo cloud-init status
sudo cat /var/log/cloud-init-output.log
# Check for specific errors
grep -i error /var/log/cloud-init-output.log
grep -i docker /var/log/cloud-init.log
```
## Getting Help
If stuck, provide:
1. Output of `systemctl status hermes.service`
2. Output of `docker ps -a`
3. Last 50 lines of `docker logs hermes`
4. Contents of `~/.hermes/.env` (redact secrets)
5. Contents of `~/.hermes/config.yaml`
6. Output of `cloud-init status`