- Split cloudinit.tf into cloudinit-hermes.tf and cloudinit-openclaw.tf - Split variables.tf into variables-common.tf, variables-hermes.tf, variables-openclaw.tf - Move templates into hermes/templates/ and openclaw/templates/ - Move models/ into openclaw/models/ - Move hermes-openclaw.json to openclaw/openclaw-reference.json - Move hermes docs to hermes/docs/ - OpenClaw cloudinit now uses variables instead of hardcoded values - All 48 variable references verified against definitions
330 lines
6.6 KiB
Markdown
330 lines
6.6 KiB
Markdown
# Hermes Agent Debugging Guide
|
|
|
|
This guide helps diagnose why Hermes Agent may not be running after Terraform deployment.
|
|
|
|
## Quick Diagnostic Checklist
|
|
|
|
### 1. Service Status
|
|
|
|
```bash
|
|
# Check systemd service status
|
|
systemctl status hermes.service
|
|
|
|
# View service logs
|
|
journalctl -u hermes.service -f
|
|
|
|
# Check if container exists
|
|
docker ps -a | grep hermes
|
|
|
|
# View container logs
|
|
docker logs hermes
|
|
```
|
|
|
|
### 2. Docker Health
|
|
|
|
```bash
|
|
# Verify Docker is running
|
|
systemctl status docker
|
|
|
|
# List containers
|
|
docker ps -a
|
|
|
|
# Check Docker events (watch real-time)
|
|
docker events
|
|
|
|
# Check docker socket permissions
|
|
ls -la /var/run/docker.sock
|
|
```
|
|
|
|
### 3. Directory and File Permissions
|
|
|
|
```bash
|
|
# Check .hermes directory
|
|
ls -la ~/.hermes/
|
|
ls -la ~/.hermes/.env
|
|
ls -la ~/docker-compose.yml
|
|
|
|
# Check file contents
|
|
cat ~/.hermes/.env
|
|
cat ~/.hermes/config.yaml
|
|
cat ~/docker-compose.yml
|
|
```
|
|
|
|
## Common Issues and Fixes
|
|
|
|
### Issue 1: "Hermes container not running"
|
|
|
|
**Symptoms:**
|
|
- `docker ps` shows no hermes container
|
|
- `.hermes` folder exists but docker container won't start
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check service status
|
|
systemctl status hermes.service
|
|
|
|
# Check recent logs
|
|
journalctl -u hermes.service -n 50
|
|
|
|
# Check docker logs more verbosely
|
|
docker logs hermes 2>&1 | tail -50
|
|
```
|
|
|
|
**Root Causes:**
|
|
1. **Docker image not pulled properly** → Pull manually:
|
|
```bash
|
|
docker pull nousresearch/hermes-agent:latest
|
|
```
|
|
|
|
2. **Missing .env file** → Check if it exists and has content:
|
|
```bash
|
|
ls -la ~/.hermes/.env
|
|
cat ~/.hermes/.env
|
|
```
|
|
|
|
3. **Directory permission issues** → Fix permissions:
|
|
```bash
|
|
sudo chown -R $(whoami):$(whoami) ~/.hermes
|
|
chmod 755 ~/.hermes
|
|
chmod 600 ~/.hermes/.env
|
|
```
|
|
|
|
4. **Docker compose file not found** → Verify location:
|
|
```bash
|
|
ls -la ~/docker-compose.yml
|
|
cat ~/docker-compose.yml
|
|
```
|
|
|
|
5. **Port 18789 already in use** → Check:
|
|
```bash
|
|
lsof -i :18789
|
|
```
|
|
If occupied, either:
|
|
- Kill the process using it
|
|
- Change the port in docker-compose.yml
|
|
|
|
### Issue 2: "Container starts but immediately exits"
|
|
|
|
**Symptoms:**
|
|
- `docker ps` is empty but `docker ps -a` shows the container with "Exited" status
|
|
- Container stops within seconds of starting
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# View the exit code
|
|
docker ps -a | grep hermes
|
|
|
|
# Get more detailed error logs
|
|
docker logs hermes
|
|
```
|
|
|
|
**Common Fixes:**
|
|
1. **Invalid YAML in config.yaml** → Validate syntax:
|
|
```bash
|
|
python3 -c "import yaml; yaml.safe_load(open('~/.hermes/config.yaml'))"
|
|
```
|
|
|
|
2. **Missing API keys** → Check:
|
|
```bash
|
|
grep -E "OPENROUTER|DISCORD_BOT|BRAVE" ~/.hermes/.env
|
|
```
|
|
|
|
3. **Invalid gateway token** → Verify:
|
|
```bash
|
|
echo $HERMES_GATEWAY_TOKEN
|
|
```
|
|
|
|
### Issue 3: "Docker daemon won't start"
|
|
|
|
**Symptoms:**
|
|
- `systemctl status docker` shows failed/inactive
|
|
- `docker ps` returns "Cannot connect to Docker daemon"
|
|
|
|
**Fixes:**
|
|
```bash
|
|
# Start Docker
|
|
sudo systemctl start docker
|
|
|
|
# Enable on boot
|
|
sudo systemctl enable docker
|
|
|
|
# Check Docker health
|
|
docker ps
|
|
```
|
|
|
|
### Issue 4: "Discord bot shows offline"
|
|
|
|
**Symptoms:**
|
|
- Hermes is running (docker ps shows container)
|
|
- But Discord bot doesn't show "online" status in your server
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check if Discord configuration is loaded
|
|
grep -i discord ~/.hermes/.env
|
|
grep -i discord ~/.hermes/config.yaml
|
|
|
|
# View container logs for Discord errors
|
|
docker logs hermes | grep -i discord
|
|
```
|
|
|
|
**Root Causes:**
|
|
1. **Invalid bot token** → Verify in .env:
|
|
```bash
|
|
grep DISCORD_BOT_TOKEN ~/.hermes/.env
|
|
```
|
|
|
|
2. **Wrong server ID** → Check config:
|
|
```bash
|
|
grep -A 5 "discord_server_id" ~/.hermes/config.yaml
|
|
```
|
|
|
|
3. **User IDs not in server** → Verify in allowlist:
|
|
```bash
|
|
grep -A 10 "users:" ~/.hermes/config.yaml
|
|
```
|
|
|
|
4. **Gateway not running** → Check port:
|
|
```bash
|
|
lsof -i :18789
|
|
```
|
|
|
|
5. **Bot not in server** → Manual fix:
|
|
1. Go to Discord Developer Portal
|
|
2. Select your bot
|
|
3. Copy OAuth2 URL with scopes: `bot`, `applications.commands`
|
|
4. Click the URL to invite bot to your server
|
|
|
|
### Issue 5: "Container gets killed after startup"
|
|
|
|
**Symptoms:**
|
|
- Service shows active but container keeps restarting
|
|
- `docker logs` shows memory or resource errors
|
|
|
|
**Fixes:**
|
|
```bash
|
|
# Check Docker stats
|
|
docker stats hermes
|
|
|
|
# Check docker-compose.yml resource limits
|
|
grep -A 5 "deploy:" ~/docker-compose.yml
|
|
|
|
# Increase memory limit if needed
|
|
# Edit ~/docker-compose.yml and increase memory value
|
|
nano ~/docker-compose.yml
|
|
```
|
|
|
|
## Verification Steps
|
|
|
|
Once you believe Hermes is running, verify with:
|
|
|
|
```bash
|
|
# Health check script (if it exists)
|
|
bash /usr/local/bin/hermes-health-check.sh
|
|
|
|
# Manual health checks
|
|
echo "1. Service status:"
|
|
systemctl is-active hermes.service
|
|
|
|
echo "2. Container running:"
|
|
docker ps | grep hermes
|
|
|
|
echo "3. Port listening:"
|
|
netstat -tlnp | grep 18789
|
|
```
|
|
|
|
## Manual Start/Stop
|
|
|
|
If the systemd service isn't working:
|
|
|
|
```bash
|
|
# Manual start
|
|
cd ~/
|
|
docker compose -f docker-compose.yml up -d
|
|
|
|
# Manual stop
|
|
cd ~/
|
|
docker compose -f docker-compose.yml down
|
|
|
|
# Manual logs
|
|
cd ~/
|
|
docker compose -f docker-compose.yml logs -f
|
|
```
|
|
|
|
## Rebuilding from Scratch
|
|
|
|
If nothing else works:
|
|
|
|
```bash
|
|
# Stop everything
|
|
systemctl stop hermes.service
|
|
docker compose -f ~/docker-compose.yml down
|
|
|
|
# Remove container and image
|
|
docker rm hermes 2>/dev/null || true
|
|
docker rmi nousresearch/hermes-agent:latest 2>/dev/null || true
|
|
|
|
# Pull fresh image
|
|
docker pull nousresearch/hermes-agent:latest
|
|
|
|
# Start service again
|
|
systemctl start hermes.service
|
|
|
|
# Monitor startup
|
|
journalctl -u hermes.service -f
|
|
```
|
|
|
|
## Debug Mode
|
|
|
|
For more verbose logging:
|
|
|
|
```bash
|
|
# Watch service logs with timestamps
|
|
journalctl -u hermes.service -f --all
|
|
|
|
# Watch docker logs continuously
|
|
docker logs -f --tail=50 hermes
|
|
|
|
# Run docker compose in foreground (stops automated service)
|
|
cd ~/
|
|
docker compose -f docker-compose.yml up
|
|
```
|
|
|
|
## Testing Discord Connectivity
|
|
|
|
Once Hermes is running:
|
|
|
|
```bash
|
|
# Send a test message to your Discord bot
|
|
# The bot should respond in the channel or via DM
|
|
|
|
# Check if bot is responding to mentions
|
|
@hermes help
|
|
|
|
# Or check logs for Discord activity
|
|
docker logs hermes | tail -100
|
|
```
|
|
|
|
## Terraform Logs
|
|
|
|
Check cloud-init logs on the server for deployment issues:
|
|
|
|
```bash
|
|
# View cloud-init output
|
|
sudo cloud-init status
|
|
sudo cat /var/log/cloud-init-output.log
|
|
|
|
# Check for specific errors
|
|
grep -i error /var/log/cloud-init-output.log
|
|
grep -i docker /var/log/cloud-init.log
|
|
```
|
|
|
|
## Getting Help
|
|
|
|
If stuck, provide:
|
|
1. Output of `systemctl status hermes.service`
|
|
2. Output of `docker ps -a`
|
|
3. Last 50 lines of `docker logs hermes`
|
|
4. Contents of `~/.hermes/.env` (redact secrets)
|
|
5. Contents of `~/.hermes/config.yaml`
|
|
6. Output of `cloud-init status`
|