openboatmobile-ai/docs/HERMES_DEBUGGING.md
CeeLo Greenheart a593af9b27 Initial commit - Clean public release
Sanitized for public release:
- Removed all API keys, tokens, and secrets
- Removed personal Discord IDs from hermes-openclaw.json
- Updated git URLs to be generic placeholders
- All sensitive data uses environment variable interpolation
2026-04-22 19:13:28 +00:00

6.6 KiB

Hermes Agent Debugging Guide

This guide helps diagnose why Hermes Agent may not be running after Terraform deployment.

Quick Diagnostic Checklist

1. Service Status

# Check systemd service status
systemctl status hermes.service

# View service logs
journalctl -u hermes.service -f

# Check if container exists
docker ps -a | grep hermes

# View container logs
docker logs hermes

2. Docker Health

# Verify Docker is running
systemctl status docker

# List containers
docker ps -a

# Check Docker events (watch real-time)
docker events

# Check docker socket permissions
ls -la /var/run/docker.sock

3. Directory and File Permissions

# Check .hermes directory
ls -la ~/.hermes/
ls -la ~/.hermes/.env
ls -la ~/docker-compose.yml

# Check file contents
cat ~/.hermes/.env
cat ~/.hermes/config.yaml
cat ~/docker-compose.yml

Common Issues and Fixes

Issue 1: "Hermes container not running"

Symptoms:

  • docker ps shows no hermes container
  • .hermes folder exists but docker container won't start

Diagnosis:

# Check service status
systemctl status hermes.service

# Check recent logs
journalctl -u hermes.service -n 50

# Check docker logs more verbosely
docker logs hermes 2>&1 | tail -50

Root Causes:

  1. Docker image not pulled properly → Pull manually:

    docker pull nousresearch/hermes-agent:latest
    
  2. Missing .env file → Check if it exists and has content:

    ls -la ~/.hermes/.env
    cat ~/.hermes/.env
    
  3. Directory permission issues → Fix permissions:

    sudo chown -R $(whoami):$(whoami) ~/.hermes
    chmod 755 ~/.hermes
    chmod 600 ~/.hermes/.env
    
  4. Docker compose file not found → Verify location:

    ls -la ~/docker-compose.yml
    cat ~/docker-compose.yml
    
  5. Port 18789 already in use → Check:

    lsof -i :18789
    

    If occupied, either:

    • Kill the process using it
    • Change the port in docker-compose.yml

Issue 2: "Container starts but immediately exits"

Symptoms:

  • docker ps is empty but docker ps -a shows the container with "Exited" status
  • Container stops within seconds of starting

Diagnosis:

# View the exit code
docker ps -a | grep hermes

# Get more detailed error logs
docker logs hermes

Common Fixes:

  1. Invalid YAML in config.yaml → Validate syntax:

    python3 -c "import yaml; yaml.safe_load(open('~/.hermes/config.yaml'))"
    
  2. Missing API keys → Check:

    grep -E "OPENROUTER|DISCORD_BOT|BRAVE" ~/.hermes/.env
    
  3. Invalid gateway token → Verify:

    echo $HERMES_GATEWAY_TOKEN
    

Issue 3: "Docker daemon won't start"

Symptoms:

  • systemctl status docker shows failed/inactive
  • docker ps returns "Cannot connect to Docker daemon"

Fixes:

# Start Docker
sudo systemctl start docker

# Enable on boot
sudo systemctl enable docker

# Check Docker health
docker ps

Issue 4: "Discord bot shows offline"

Symptoms:

  • Hermes is running (docker ps shows container)
  • But Discord bot doesn't show "online" status in your server

Diagnosis:

# Check if Discord configuration is loaded
grep -i discord ~/.hermes/.env
grep -i discord ~/.hermes/config.yaml

# View container logs for Discord errors
docker logs hermes | grep -i discord

Root Causes:

  1. Invalid bot token → Verify in .env:

    grep DISCORD_BOT_TOKEN ~/.hermes/.env
    
  2. Wrong server ID → Check config:

    grep -A 5 "discord_server_id" ~/.hermes/config.yaml
    
  3. User IDs not in server → Verify in allowlist:

    grep -A 10 "users:" ~/.hermes/config.yaml
    
  4. Gateway not running → Check port:

    lsof -i :18789
    
  5. Bot not in server → Manual fix:

    1. Go to Discord Developer Portal
    2. Select your bot
    3. Copy OAuth2 URL with scopes: bot, applications.commands
    4. Click the URL to invite bot to your server

Issue 5: "Container gets killed after startup"

Symptoms:

  • Service shows active but container keeps restarting
  • docker logs shows memory or resource errors

Fixes:

# Check Docker stats
docker stats hermes

# Check docker-compose.yml resource limits
grep -A 5 "deploy:" ~/docker-compose.yml

# Increase memory limit if needed
# Edit ~/docker-compose.yml and increase memory value
nano ~/docker-compose.yml

Verification Steps

Once you believe Hermes is running, verify with:

# Health check script (if it exists)
bash /usr/local/bin/hermes-health-check.sh

# Manual health checks
echo "1. Service status:"
systemctl is-active hermes.service

echo "2. Container running:"
docker ps | grep hermes

echo "3. Port listening:"
netstat -tlnp | grep 18789

Manual Start/Stop

If the systemd service isn't working:

# Manual start
cd ~/
docker compose -f docker-compose.yml up -d

# Manual stop
cd ~/
docker compose -f docker-compose.yml down

# Manual logs
cd ~/
docker compose -f docker-compose.yml logs -f

Rebuilding from Scratch

If nothing else works:

# Stop everything
systemctl stop hermes.service
docker compose -f ~/docker-compose.yml down

# Remove container and image
docker rm hermes 2>/dev/null || true
docker rmi nousresearch/hermes-agent:latest 2>/dev/null || true

# Pull fresh image
docker pull nousresearch/hermes-agent:latest

# Start service again
systemctl start hermes.service

# Monitor startup
journalctl -u hermes.service -f

Debug Mode

For more verbose logging:

# Watch service logs with timestamps
journalctl -u hermes.service -f --all

# Watch docker logs continuously
docker logs -f --tail=50 hermes

# Run docker compose in foreground (stops automated service)
cd ~/
docker compose -f docker-compose.yml up

Testing Discord Connectivity

Once Hermes is running:

# Send a test message to your Discord bot
# The bot should respond in the channel or via DM

# Check if bot is responding to mentions
@hermes help

# Or check logs for Discord activity
docker logs hermes | tail -100

Terraform Logs

Check cloud-init logs on the server for deployment issues:

# View cloud-init output
sudo cloud-init status
sudo cat /var/log/cloud-init-output.log

# Check for specific errors
grep -i error /var/log/cloud-init-output.log
grep -i docker /var/log/cloud-init.log

Getting Help

If stuck, provide:

  1. Output of systemctl status hermes.service
  2. Output of docker ps -a
  3. Last 50 lines of docker logs hermes
  4. Contents of ~/.hermes/.env (redact secrets)
  5. Contents of ~/.hermes/config.yaml
  6. Output of cloud-init status