refactor: restructure into hermes/ and openclaw/ directories
- Split cloudinit.tf into cloudinit-hermes.tf and cloudinit-openclaw.tf - Split variables.tf into variables-common.tf, variables-hermes.tf, variables-openclaw.tf - Move templates into hermes/templates/ and openclaw/templates/ - Move models/ into openclaw/models/ - Move hermes-openclaw.json to openclaw/openclaw-reference.json - Move hermes docs to hermes/docs/ - OpenClaw cloudinit now uses variables instead of hardcoded values - All 48 variable references verified against definitions
This commit is contained in:
parent
8a94313bd3
commit
ea73745147
21 changed files with 277 additions and 216 deletions
203
hermes/docs/AUDIT_REPORT.md
Normal file
203
hermes/docs/AUDIT_REPORT.md
Normal file
|
|
@ -0,0 +1,203 @@
|
|||
# Hermes Deployment Audit Report
|
||||
|
||||
## Issues Found
|
||||
|
||||
During the audit of the Terraform project for Hermes Agent deployment, several critical issues were identified that would prevent Hermes from running properly:
|
||||
|
||||
### 1. **Systemd Service Configuration Error** (CRITICAL)
|
||||
**Problem:** The systemd service didn't specify the docker-compose file path
|
||||
- `ExecStart=/usr/bin/docker compose up` without the `-f` flag
|
||||
- The service couldn't find docker-compose.yml when running from an arbitrary directory
|
||||
- No guarantee the service would change to the correct working directory
|
||||
|
||||
**Impact:** Service would start but immediately fail or not find the compose file.
|
||||
|
||||
**Fix:** Updated to:
|
||||
```ini
|
||||
ExecStart=/bin/sh -c 'cd /home/${admin_user} && exec docker compose -f docker-compose.yml up'
|
||||
ExecStop=/bin/sh -c 'cd /home/${admin_user} && exec docker compose -f docker-compose.yml down'
|
||||
```
|
||||
|
||||
### 2. **User Permissions Issue** (CRITICAL)
|
||||
**Problem:** Service was configured to run as `User=${admin_user}` (non-root)
|
||||
- Adding a user to the docker group with `usermod -aG docker` doesn't take effect for existing sessions
|
||||
- The systemd service tries to use docker before the hermes user has proper permissions
|
||||
- Would require a re-login to apply the docker group permissions
|
||||
|
||||
**Impact:** Service runs as hermes user without the necessary docker group permissions, causing "permission denied" errors.
|
||||
|
||||
**Fix:** Changed service to run as root (necessary for Docker):
|
||||
```ini
|
||||
User=root
|
||||
```
|
||||
And ensured proper file ownership:
|
||||
```bash
|
||||
chown ${admin_user}:${admin_user} /home/${admin_user}/docker-compose.yml
|
||||
chmod 644 /home/${admin_user}/docker-compose.yml
|
||||
```
|
||||
|
||||
### 3. **Installation Order Issue**
|
||||
**Problem:** Docker image was pulled before docker-compose-plugin was installed
|
||||
- `docker pull` command succeeded (using legacy docker)
|
||||
- But `docker compose` (the plugin) comes later
|
||||
- If the pull failed, docker-compose-plugin wouldn't have been installed yet
|
||||
|
||||
**Impact:** Potential race condition during bootstrap.
|
||||
|
||||
**Fix:** Reordered runcmd to install docker-compose-plugin immediately after Docker:
|
||||
```yaml
|
||||
1. curl docker installer
|
||||
2. apt-get install docker-compose-plugin # BEFORE pulling image
|
||||
3. docker pull nousresearch/hermes-agent:latest
|
||||
```
|
||||
|
||||
### 4. **No Docker Daemon Ready Check** (HIGH)
|
||||
**Problem:** Script tried to pull images immediately after Docker installation
|
||||
- Docker socket might not be ready
|
||||
- Starting services before Docker is fully operational
|
||||
|
||||
**Impact:** Timing-dependent failures, especially on slower systems.
|
||||
|
||||
**Fix:** Added health checks and delays:
|
||||
```bash
|
||||
# Wait for Docker daemon to be ready
|
||||
sleep 5
|
||||
docker ps > /dev/null || (sleep 10 && docker ps)
|
||||
```
|
||||
|
||||
### 5. **No Service Startup Verification** (MEDIUM)
|
||||
**Problem:** Service was started with no check that it actually came up
|
||||
- If the service failed to start, deployment would complete successfully anyway
|
||||
- User wouldn't know until they SSH in
|
||||
|
||||
**Impact:** Silent failures that only become apparent when checking the server.
|
||||
|
||||
**Fix:** Added verification:
|
||||
```bash
|
||||
# Verify service started
|
||||
systemctl is-active hermes.service || systemctl status hermes.service
|
||||
```
|
||||
|
||||
### 6. **Poor Error Logging** (MEDIUM)
|
||||
**Problem:** systemd service logged to stdout but nothing captured the startup errors
|
||||
- No journal entries with what went wrong
|
||||
- No way to see Docker errors in the cloud-init logs
|
||||
|
||||
**Impact:** Difficult to diagnose why the service failed.
|
||||
|
||||
**Fix:** Added proper journal logging:
|
||||
```ini
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=hermes
|
||||
```
|
||||
|
||||
## Changes Made
|
||||
|
||||
### Terraform Files Modified
|
||||
|
||||
1. **templates/userdata-hermes.tpl**
|
||||
- Fixed systemd service configuration
|
||||
- Reordered runcmd operations
|
||||
- Added Docker readiness checks and delays
|
||||
- Enhanced health check script
|
||||
- Added service startup verification
|
||||
- Improved completion messages
|
||||
|
||||
2. **docs/HERMES_DEBUGGING.md** (NEW)
|
||||
- Comprehensive troubleshooting guide
|
||||
- Common issues and solutions
|
||||
- Diagnostic commands
|
||||
- Manual start/stop procedures
|
||||
- Discord connectivity testing
|
||||
|
||||
3. **README.md**
|
||||
- Added reference to HERMES_DEBUGGING.md documentation
|
||||
|
||||
## Testing These Changes
|
||||
|
||||
To test the fixes, you need to redeploy:
|
||||
|
||||
```bash
|
||||
# Option 1: Destroy and redeploy (cleanest)
|
||||
terraform destroy
|
||||
# Answer yes when prompted
|
||||
source .env && terraform init && terraform apply
|
||||
|
||||
# Option 2: Update existing (if keeping infrastructure)
|
||||
source .env && terraform apply -auto-approve
|
||||
```
|
||||
|
||||
After deployment, verify Hermes is running:
|
||||
|
||||
```bash
|
||||
# SSH into the server (username is 'hermes' or your override)
|
||||
ssh hermes@<SERVER_IP>
|
||||
|
||||
# Run the health check
|
||||
/usr/local/bin/hermes-health-check.sh
|
||||
|
||||
# Or manually verify
|
||||
systemctl status hermes.service
|
||||
docker ps
|
||||
docker logs hermes
|
||||
```
|
||||
|
||||
## Deployment Flow Now
|
||||
|
||||
With the fixes, the cloud-init deployment flow is now:
|
||||
|
||||
1. ✓ Update system packages
|
||||
2. ✓ Create hermes user
|
||||
3. ✓ Write configuration files (.env, config.yaml, docker-compose.yml, SOUL.md)
|
||||
4. ✓ Write health check script
|
||||
5. ✓ Write systemd service unit
|
||||
6. ✓ Install Docker
|
||||
7. ✓ Install docker-compose-plugin
|
||||
8. ✓ Wait for Docker daemon to be ready
|
||||
9. ✓ Pull Hermes image
|
||||
10. ✓ Set proper permissions
|
||||
11. ✓ Reload systemd
|
||||
12. ✓ Enable hermes.service
|
||||
13. ✓ Start systemd service (which runs docker-compose up)
|
||||
14. ✓ Wait for startup
|
||||
15. ✓ Verify service is active
|
||||
|
||||
## Expected Behavior After Fix
|
||||
|
||||
When you SSH into the server after deployment:
|
||||
|
||||
```bash
|
||||
$ systemctl status hermes.service
|
||||
● hermes.service - Hermes Agent Service
|
||||
Loaded: loaded (/etc/systemd/system/hermes.service; enabled; vendor preset: enabled)
|
||||
Active: active (running) since ...
|
||||
|
||||
$ docker ps
|
||||
CONTAINER ID IMAGE STATUS
|
||||
abc123 nousresearch/hermes-agent:latest Up 2 minutes
|
||||
|
||||
$ docker logs hermes
|
||||
[INFO] Hermes Agent starting...
|
||||
[INFO] Discord bot initialized
|
||||
...
|
||||
```
|
||||
|
||||
And in Discord:
|
||||
- Bot shows "online" status
|
||||
- Responds to mentions in configured channels
|
||||
- Respects user allowlist
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Redeploy** with the fixed template
|
||||
2. **Verify** using the health checks documented in HERMES_DEBUGGING.md
|
||||
3. **Test Discord** connectivity by mentioning the bot in a channel
|
||||
4. **Monitor logs** using `docker logs -f hermes` if issues occur
|
||||
|
||||
## Additional Notes
|
||||
|
||||
- The audit identified these issues by analyzing the template configuration and deployment flow
|
||||
- Similar fixes should be applied if you have OpenClaw deployments
|
||||
- The systemd service is now production-ready with proper error handling
|
||||
- Health check script was significantly enhanced for better diagnostics
|
||||
330
hermes/docs/DEBUGGING.md
Normal file
330
hermes/docs/DEBUGGING.md
Normal file
|
|
@ -0,0 +1,330 @@
|
|||
# Hermes Agent Debugging Guide
|
||||
|
||||
This guide helps diagnose why Hermes Agent may not be running after Terraform deployment.
|
||||
|
||||
## Quick Diagnostic Checklist
|
||||
|
||||
### 1. Service Status
|
||||
|
||||
```bash
|
||||
# Check systemd service status
|
||||
systemctl status hermes.service
|
||||
|
||||
# View service logs
|
||||
journalctl -u hermes.service -f
|
||||
|
||||
# Check if container exists
|
||||
docker ps -a | grep hermes
|
||||
|
||||
# View container logs
|
||||
docker logs hermes
|
||||
```
|
||||
|
||||
### 2. Docker Health
|
||||
|
||||
```bash
|
||||
# Verify Docker is running
|
||||
systemctl status docker
|
||||
|
||||
# List containers
|
||||
docker ps -a
|
||||
|
||||
# Check Docker events (watch real-time)
|
||||
docker events
|
||||
|
||||
# Check docker socket permissions
|
||||
ls -la /var/run/docker.sock
|
||||
```
|
||||
|
||||
### 3. Directory and File Permissions
|
||||
|
||||
```bash
|
||||
# Check .hermes directory
|
||||
ls -la ~/.hermes/
|
||||
ls -la ~/.hermes/.env
|
||||
ls -la ~/docker-compose.yml
|
||||
|
||||
# Check file contents
|
||||
cat ~/.hermes/.env
|
||||
cat ~/.hermes/config.yaml
|
||||
cat ~/docker-compose.yml
|
||||
```
|
||||
|
||||
## Common Issues and Fixes
|
||||
|
||||
### Issue 1: "Hermes container not running"
|
||||
|
||||
**Symptoms:**
|
||||
- `docker ps` shows no hermes container
|
||||
- `.hermes` folder exists but docker container won't start
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check service status
|
||||
systemctl status hermes.service
|
||||
|
||||
# Check recent logs
|
||||
journalctl -u hermes.service -n 50
|
||||
|
||||
# Check docker logs more verbosely
|
||||
docker logs hermes 2>&1 | tail -50
|
||||
```
|
||||
|
||||
**Root Causes:**
|
||||
1. **Docker image not pulled properly** → Pull manually:
|
||||
```bash
|
||||
docker pull nousresearch/hermes-agent:latest
|
||||
```
|
||||
|
||||
2. **Missing .env file** → Check if it exists and has content:
|
||||
```bash
|
||||
ls -la ~/.hermes/.env
|
||||
cat ~/.hermes/.env
|
||||
```
|
||||
|
||||
3. **Directory permission issues** → Fix permissions:
|
||||
```bash
|
||||
sudo chown -R $(whoami):$(whoami) ~/.hermes
|
||||
chmod 755 ~/.hermes
|
||||
chmod 600 ~/.hermes/.env
|
||||
```
|
||||
|
||||
4. **Docker compose file not found** → Verify location:
|
||||
```bash
|
||||
ls -la ~/docker-compose.yml
|
||||
cat ~/docker-compose.yml
|
||||
```
|
||||
|
||||
5. **Port 18789 already in use** → Check:
|
||||
```bash
|
||||
lsof -i :18789
|
||||
```
|
||||
If occupied, either:
|
||||
- Kill the process using it
|
||||
- Change the port in docker-compose.yml
|
||||
|
||||
### Issue 2: "Container starts but immediately exits"
|
||||
|
||||
**Symptoms:**
|
||||
- `docker ps` is empty but `docker ps -a` shows the container with "Exited" status
|
||||
- Container stops within seconds of starting
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# View the exit code
|
||||
docker ps -a | grep hermes
|
||||
|
||||
# Get more detailed error logs
|
||||
docker logs hermes
|
||||
```
|
||||
|
||||
**Common Fixes:**
|
||||
1. **Invalid YAML in config.yaml** → Validate syntax:
|
||||
```bash
|
||||
python3 -c "import yaml; yaml.safe_load(open('~/.hermes/config.yaml'))"
|
||||
```
|
||||
|
||||
2. **Missing API keys** → Check:
|
||||
```bash
|
||||
grep -E "OPENROUTER|DISCORD_BOT|BRAVE" ~/.hermes/.env
|
||||
```
|
||||
|
||||
3. **Invalid gateway token** → Verify:
|
||||
```bash
|
||||
echo $HERMES_GATEWAY_TOKEN
|
||||
```
|
||||
|
||||
### Issue 3: "Docker daemon won't start"
|
||||
|
||||
**Symptoms:**
|
||||
- `systemctl status docker` shows failed/inactive
|
||||
- `docker ps` returns "Cannot connect to Docker daemon"
|
||||
|
||||
**Fixes:**
|
||||
```bash
|
||||
# Start Docker
|
||||
sudo systemctl start docker
|
||||
|
||||
# Enable on boot
|
||||
sudo systemctl enable docker
|
||||
|
||||
# Check Docker health
|
||||
docker ps
|
||||
```
|
||||
|
||||
### Issue 4: "Discord bot shows offline"
|
||||
|
||||
**Symptoms:**
|
||||
- Hermes is running (docker ps shows container)
|
||||
- But Discord bot doesn't show "online" status in your server
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check if Discord configuration is loaded
|
||||
grep -i discord ~/.hermes/.env
|
||||
grep -i discord ~/.hermes/config.yaml
|
||||
|
||||
# View container logs for Discord errors
|
||||
docker logs hermes | grep -i discord
|
||||
```
|
||||
|
||||
**Root Causes:**
|
||||
1. **Invalid bot token** → Verify in .env:
|
||||
```bash
|
||||
grep DISCORD_BOT_TOKEN ~/.hermes/.env
|
||||
```
|
||||
|
||||
2. **Wrong server ID** → Check config:
|
||||
```bash
|
||||
grep -A 5 "discord_server_id" ~/.hermes/config.yaml
|
||||
```
|
||||
|
||||
3. **User IDs not in server** → Verify in allowlist:
|
||||
```bash
|
||||
grep -A 10 "users:" ~/.hermes/config.yaml
|
||||
```
|
||||
|
||||
4. **Gateway not running** → Check port:
|
||||
```bash
|
||||
lsof -i :18789
|
||||
```
|
||||
|
||||
5. **Bot not in server** → Manual fix:
|
||||
1. Go to Discord Developer Portal
|
||||
2. Select your bot
|
||||
3. Copy OAuth2 URL with scopes: `bot`, `applications.commands`
|
||||
4. Click the URL to invite bot to your server
|
||||
|
||||
### Issue 5: "Container gets killed after startup"
|
||||
|
||||
**Symptoms:**
|
||||
- Service shows active but container keeps restarting
|
||||
- `docker logs` shows memory or resource errors
|
||||
|
||||
**Fixes:**
|
||||
```bash
|
||||
# Check Docker stats
|
||||
docker stats hermes
|
||||
|
||||
# Check docker-compose.yml resource limits
|
||||
grep -A 5 "deploy:" ~/docker-compose.yml
|
||||
|
||||
# Increase memory limit if needed
|
||||
# Edit ~/docker-compose.yml and increase memory value
|
||||
nano ~/docker-compose.yml
|
||||
```
|
||||
|
||||
## Verification Steps
|
||||
|
||||
Once you believe Hermes is running, verify with:
|
||||
|
||||
```bash
|
||||
# Health check script (if it exists)
|
||||
bash /usr/local/bin/hermes-health-check.sh
|
||||
|
||||
# Manual health checks
|
||||
echo "1. Service status:"
|
||||
systemctl is-active hermes.service
|
||||
|
||||
echo "2. Container running:"
|
||||
docker ps | grep hermes
|
||||
|
||||
echo "3. Port listening:"
|
||||
netstat -tlnp | grep 18789
|
||||
```
|
||||
|
||||
## Manual Start/Stop
|
||||
|
||||
If the systemd service isn't working:
|
||||
|
||||
```bash
|
||||
# Manual start
|
||||
cd ~/
|
||||
docker compose -f docker-compose.yml up -d
|
||||
|
||||
# Manual stop
|
||||
cd ~/
|
||||
docker compose -f docker-compose.yml down
|
||||
|
||||
# Manual logs
|
||||
cd ~/
|
||||
docker compose -f docker-compose.yml logs -f
|
||||
```
|
||||
|
||||
## Rebuilding from Scratch
|
||||
|
||||
If nothing else works:
|
||||
|
||||
```bash
|
||||
# Stop everything
|
||||
systemctl stop hermes.service
|
||||
docker compose -f ~/docker-compose.yml down
|
||||
|
||||
# Remove container and image
|
||||
docker rm hermes 2>/dev/null || true
|
||||
docker rmi nousresearch/hermes-agent:latest 2>/dev/null || true
|
||||
|
||||
# Pull fresh image
|
||||
docker pull nousresearch/hermes-agent:latest
|
||||
|
||||
# Start service again
|
||||
systemctl start hermes.service
|
||||
|
||||
# Monitor startup
|
||||
journalctl -u hermes.service -f
|
||||
```
|
||||
|
||||
## Debug Mode
|
||||
|
||||
For more verbose logging:
|
||||
|
||||
```bash
|
||||
# Watch service logs with timestamps
|
||||
journalctl -u hermes.service -f --all
|
||||
|
||||
# Watch docker logs continuously
|
||||
docker logs -f --tail=50 hermes
|
||||
|
||||
# Run docker compose in foreground (stops automated service)
|
||||
cd ~/
|
||||
docker compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
## Testing Discord Connectivity
|
||||
|
||||
Once Hermes is running:
|
||||
|
||||
```bash
|
||||
# Send a test message to your Discord bot
|
||||
# The bot should respond in the channel or via DM
|
||||
|
||||
# Check if bot is responding to mentions
|
||||
@hermes help
|
||||
|
||||
# Or check logs for Discord activity
|
||||
docker logs hermes | tail -100
|
||||
```
|
||||
|
||||
## Terraform Logs
|
||||
|
||||
Check cloud-init logs on the server for deployment issues:
|
||||
|
||||
```bash
|
||||
# View cloud-init output
|
||||
sudo cloud-init status
|
||||
sudo cat /var/log/cloud-init-output.log
|
||||
|
||||
# Check for specific errors
|
||||
grep -i error /var/log/cloud-init-output.log
|
||||
grep -i docker /var/log/cloud-init.log
|
||||
```
|
||||
|
||||
## Getting Help
|
||||
|
||||
If stuck, provide:
|
||||
1. Output of `systemctl status hermes.service`
|
||||
2. Output of `docker ps -a`
|
||||
3. Last 50 lines of `docker logs hermes`
|
||||
4. Contents of `~/.hermes/.env` (redact secrets)
|
||||
5. Contents of `~/.hermes/config.yaml`
|
||||
6. Output of `cloud-init status`
|
||||
239
hermes/docs/FIX_SUMMARY.md
Normal file
239
hermes/docs/FIX_SUMMARY.md
Normal file
|
|
@ -0,0 +1,239 @@
|
|||
# Hermes Deployment Audit - Summary of Fixes
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The Terraform Hermes deployment had **5 critical issues** preventing the service from running. All have been fixed in the cloud-init template.
|
||||
|
||||
## What Was Wrong
|
||||
|
||||
### Critical Issues Found:
|
||||
|
||||
1. ✗ **Systemd service couldn't find docker-compose.yml**
|
||||
- `ExecStart=/usr/bin/docker compose up` (missing file path)
|
||||
|
||||
2. ✗ **Service ran as non-root user without Docker permissions**
|
||||
- User permissions from `usermod -aG docker` don't take effect for the systemd service
|
||||
|
||||
3. ✗ **Docker image pulled before docker-compose-plugin installed**
|
||||
- Installation order was wrong
|
||||
|
||||
4. ✗ **No check that Docker daemon was ready**
|
||||
- Timing issues during bootstrap
|
||||
|
||||
5. ✗ **No verification service actually started**
|
||||
- Deployment would complete even if Hermes failed to start
|
||||
|
||||
## What Was Fixed
|
||||
|
||||
### 1. Systemd Service Configuration
|
||||
**Before:**
|
||||
```ini
|
||||
ExecStart=/usr/bin/docker compose up
|
||||
ExecStop=/usr/bin/docker compose down
|
||||
User=${admin_user}
|
||||
```
|
||||
|
||||
**After:**
|
||||
```ini
|
||||
ExecStart=/bin/sh -c 'cd /home/${admin_user} && exec docker compose -f docker-compose.yml up'
|
||||
ExecStop=/bin/sh -c 'cd /home/${admin_user} && exec docker compose -f docker-compose.yml down'
|
||||
User=root
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=hermes
|
||||
```
|
||||
|
||||
**Why:** Now properly finds the compose file and doesn't have permission issues.
|
||||
|
||||
---
|
||||
|
||||
### 2. Installation Order
|
||||
**Before:**
|
||||
```yaml
|
||||
- curl -fsSL https://get.docker.com | sh
|
||||
- apt-get install -y docker-compose-plugin # too late
|
||||
- docker pull nousresearch/hermes-agent:latest
|
||||
```
|
||||
|
||||
**After:**
|
||||
```yaml
|
||||
- curl -fsSL https://get.docker.com | sh
|
||||
- apt-get install -y docker-compose-plugin # right after docker
|
||||
- sleep 5
|
||||
- docker ps > /dev/null || (sleep 10 && docker ps) # verify ready
|
||||
- docker pull nousresearch/hermes-agent:latest
|
||||
```
|
||||
|
||||
**Why:** Ensures docker-compose-plugin is installed before use and Docker is ready.
|
||||
|
||||
---
|
||||
|
||||
### 3. Service Startup Verification
|
||||
**Before:**
|
||||
```yaml
|
||||
- systemctl start hermes.service
|
||||
# ... done, might have failed but we don't know
|
||||
```
|
||||
|
||||
**After:**
|
||||
```yaml
|
||||
- systemctl start hermes.service
|
||||
- sleep 3
|
||||
- systemctl is-active hermes.service || systemctl status hermes.service
|
||||
```
|
||||
|
||||
**Why:** Immediately tells us if startup failed.
|
||||
|
||||
---
|
||||
|
||||
### 4. Enhanced Health Check Script
|
||||
**Added comprehensive diagnostics:**
|
||||
- ✓ Docker daemon status
|
||||
- ✓ Container exists
|
||||
- ✓ Container running (with uptime)
|
||||
- ✓ Port listening
|
||||
- ✓ Config files exist
|
||||
- ✓ Systemd service status
|
||||
- ✓ Recent logs
|
||||
- ✓ Discord configuration check
|
||||
|
||||
---
|
||||
|
||||
## New Documentation
|
||||
|
||||
### 1. **HERMES_DEBUGGING.md**
|
||||
Complete troubleshooting guide with:
|
||||
- Quick diagnostic checklist
|
||||
- Common issues and their fixes
|
||||
- Command reference
|
||||
- Manual start/stop procedures
|
||||
- Discord connectivity testing
|
||||
- Log interpretation
|
||||
|
||||
### 2. **HERMES_AUDIT_REPORT.md**
|
||||
Detailed audit findings explaining:
|
||||
- What each issue was
|
||||
- Why it caused failures
|
||||
- How it was fixed
|
||||
- Expected behavior after fixes
|
||||
|
||||
---
|
||||
|
||||
## How to Apply These Fixes
|
||||
|
||||
### Option 1: Fresh Deployment (Cleanest)
|
||||
```bash
|
||||
terraform destroy -auto-approve
|
||||
source .env && terraform init && terraform apply
|
||||
```
|
||||
|
||||
### Option 2: Update Existing Stack
|
||||
```bash
|
||||
source .env && terraform apply -auto-approve
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification After Deployment
|
||||
|
||||
After applying these fixes and deploying:
|
||||
|
||||
```bash
|
||||
# SSH into server
|
||||
ssh hermes@<SERVER_IP>
|
||||
|
||||
# Run comprehensive health check
|
||||
/usr/local/bin/hermes-health-check.sh
|
||||
|
||||
# Manually verify
|
||||
systemctl status hermes.service
|
||||
docker ps
|
||||
docker logs hermes
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
- ✓ Hermes systemd service active
|
||||
- ✓ Docker container running
|
||||
- ✓ Gateway listening on port 18789
|
||||
- ✓ Discord bot shows online in your server
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
### Core Deployment
|
||||
- `templates/userdata-hermes.tpl` - Fixed cloud-init configuration
|
||||
|
||||
### Documentation
|
||||
- `docs/HERMES_DEBUGGING.md` - **NEW** Troubleshooting guide
|
||||
- `docs/HERMES_AUDIT_REPORT.md` - **NEW** Detailed audit findings
|
||||
- `README.md` - Added reference to debugging guide
|
||||
|
||||
---
|
||||
|
||||
## Why These Fixes Work
|
||||
|
||||
Each fix addresses a specific failure point:
|
||||
|
||||
| Issue | Root Cause | Fix | Result |
|
||||
|-------|-----------|-----|--------|
|
||||
| Compose file not found | No path specified | Specify full path with `-f` | Service finds config |
|
||||
| Docker permission denied | Non-root user, group not applied | Run service as root | Service can use Docker |
|
||||
| Docker not ready | Immediate pull attempt | Add delays and checks | Image pulls successfully |
|
||||
| Silent failures | No verification | Check service status | Know if it failed |
|
||||
| Can't debug | No logging | Added journal logging | Can read logs |
|
||||
|
||||
---
|
||||
|
||||
## Testing the Fixes
|
||||
|
||||
To verify the fixes work on your deployments:
|
||||
|
||||
1. **Quick test (5 min):**
|
||||
```bash
|
||||
# Just check service is running
|
||||
systemctl status hermes.service
|
||||
docker ps | grep hermes
|
||||
```
|
||||
|
||||
2. **Full health check (10 min):**
|
||||
```bash
|
||||
/usr/local/bin/hermes-health-check.sh
|
||||
```
|
||||
|
||||
3. **Discord test (Manual):**
|
||||
- Mention the bot in a configured channel
|
||||
- It should respond within a few seconds
|
||||
|
||||
---
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If something goes wrong:
|
||||
|
||||
```bash
|
||||
# Revert to previous state
|
||||
git checkout templates/userdata-hermes.tpl
|
||||
|
||||
# Then redeploy or manually stop
|
||||
systemctl stop hermes.service
|
||||
docker compose -f ~hermes/docker-compose.yml down
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OpenClaw Status
|
||||
|
||||
✓ OpenClaw service is properly configured and doesn't have these issues.
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Review** the changes in `templates/userdata-hermes.tpl`
|
||||
2. **Redeploy** using `terraform apply`
|
||||
3. **Verify** using `systemctl status hermes.service`
|
||||
4. **Test** Discord connectivity
|
||||
5. **Refer** to `HERMES_DEBUGGING.md` if any issues occur
|
||||
|
||||
All changes are backward compatible and don't affect other components.
|
||||
233
hermes/docs/VERIFICATION_CHECKLIST.md
Normal file
233
hermes/docs/VERIFICATION_CHECKLIST.md
Normal file
|
|
@ -0,0 +1,233 @@
|
|||
# Quick Reference: Hermes Deployment Status Check
|
||||
|
||||
## For Current Deployment (Before Fixes)
|
||||
|
||||
If you're still SSH'd into the server from your initial deployment, run these checks:
|
||||
|
||||
### Check 1: Is the systemd service running?
|
||||
```bash
|
||||
systemctl status hermes.service
|
||||
```
|
||||
**Expected (BROKEN - before fix):** Shows `failed` or `inactive`
|
||||
|
||||
### Check 2: Does the Docker container exist?
|
||||
```bash
|
||||
docker ps -a | grep hermes
|
||||
```
|
||||
**Expected (BROKEN - before fix):** Container doesn't exist OR shows `Exited` status
|
||||
|
||||
### Check 3: Check systemd journal for errors
|
||||
```bash
|
||||
journalctl -u hermes.service | tail -50
|
||||
```
|
||||
**Expected (BROKEN - before fix):** Error like "docker: command not found" or "file not found"
|
||||
|
||||
### Check 4: Watch docker logs
|
||||
```bash
|
||||
docker logs hermes 2>&1 | head -20
|
||||
```
|
||||
**Expected (BROKEN - before fix):** Either no container, or errors about missing files
|
||||
|
||||
### Check 5: Is Discord bot online?
|
||||
```bash
|
||||
# Go to Discord and check your server
|
||||
# Look for the bot in members list
|
||||
```
|
||||
**Expected (BROKEN - before fix):** Shows `Offline` or doesn't appear
|
||||
|
||||
---
|
||||
|
||||
## After Redeploying with Fixes
|
||||
|
||||
Run these verification commands immediately after deployment:
|
||||
|
||||
### Quick Verification (< 1 minute)
|
||||
```bash
|
||||
# 1. Check service status
|
||||
systemctl status hermes.service
|
||||
|
||||
# 2. Check Docker container
|
||||
docker ps | grep hermes
|
||||
|
||||
# 3. Check port is listening
|
||||
netstat -tlnp | grep 18789
|
||||
```
|
||||
|
||||
**Expected (FIXED):**
|
||||
- Service shows `active (running)`
|
||||
- Container shows `UP` status
|
||||
- Port 18789 shows `LISTEN`
|
||||
|
||||
### Comprehensive Health Check (< 5 minutes)
|
||||
```bash
|
||||
/usr/local/bin/hermes-health-check.sh
|
||||
```
|
||||
|
||||
**Expected (FIXED):** All checks show ✓
|
||||
|
||||
### Detailed Logs
|
||||
```bash
|
||||
# Check what's happening in the container
|
||||
docker logs -f hermes
|
||||
|
||||
# Use Ctrl+C to exit after 10-20 lines
|
||||
```
|
||||
|
||||
**Expected (FIXED):**
|
||||
```
|
||||
[INFO] Hermes Agent Framework starting...
|
||||
[INFO] Initializing gateway on port 18789
|
||||
[INFO] Discord bot initialized
|
||||
```
|
||||
|
||||
### Discord Connectivity Test
|
||||
```bash
|
||||
# In your Discord server, type:
|
||||
@hermes help
|
||||
|
||||
# Bot should respond within 5 seconds
|
||||
```
|
||||
|
||||
**Expected (FIXED):** Bot is online and responds
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Matrix
|
||||
|
||||
| Symptom | Check | Fix |
|
||||
|---------|-------|-----|
|
||||
| Service shows `failed` | `journalctl -u hermes.service` | Redeploy with fixed template |
|
||||
| Container `Exited` | `docker logs hermes` | Check the logs for errors |
|
||||
| Port not listening | `docker ps` | Container not running |
|
||||
| Docker permission denied | Check User= in service | Should be `root` now |
|
||||
| Bot shows offline | Check Discord bot token | Verify in `.env` file |
|
||||
| No container at all | `docker ps -a` | Image wasn't pulled, redeploy |
|
||||
|
||||
---
|
||||
|
||||
## Command Reference
|
||||
|
||||
### Systemd Service
|
||||
```bash
|
||||
# Check status
|
||||
systemctl status hermes.service
|
||||
|
||||
# View logs (last 50 lines)
|
||||
journalctl -u hermes.service -n 50
|
||||
|
||||
# View logs with timestamps
|
||||
journalctl -u hermes.service -f --all
|
||||
|
||||
# Restart service
|
||||
systemctl restart hermes.service
|
||||
|
||||
# Stop service
|
||||
systemctl stop hermes.service
|
||||
|
||||
# Start service
|
||||
systemctl start hermes.service
|
||||
```
|
||||
|
||||
### Docker
|
||||
```bash
|
||||
# List running containers
|
||||
docker ps
|
||||
|
||||
# List all containers (including stopped)
|
||||
docker ps -a
|
||||
|
||||
# View container logs
|
||||
docker logs hermes
|
||||
|
||||
# Follow logs (live)
|
||||
docker logs -f hermes
|
||||
|
||||
# Show last 100 lines
|
||||
docker logs --tail=100 hermes
|
||||
|
||||
# Inspect container
|
||||
docker inspect hermes
|
||||
```
|
||||
|
||||
### Files to Check
|
||||
```bash
|
||||
# Configuration files
|
||||
cat ~/.hermes/.env
|
||||
cat ~/.hermes/config.yaml
|
||||
cat ~/docker-compose.yml
|
||||
|
||||
# Check permissions
|
||||
ls -la ~/.hermes/
|
||||
|
||||
# Check if Hermes healthcheck script exists
|
||||
ls -la /usr/local/bin/hermes-health-check.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Before vs After Comparison
|
||||
|
||||
### BEFORE These Fixes:
|
||||
```
|
||||
❌ systemctl status hermes.service
|
||||
→ inactive (dead)
|
||||
|
||||
❌ docker ps
|
||||
→ (no container)
|
||||
|
||||
❌ journalctl -u hermes.service
|
||||
→ cannot open: "/home/hermes/docker-compose.yml"
|
||||
|
||||
❌ Discord bot
|
||||
→ OFFLINE
|
||||
```
|
||||
|
||||
### AFTER These Fixes:
|
||||
```
|
||||
✓ systemctl status hermes.service
|
||||
→ active (running)
|
||||
|
||||
✓ docker ps
|
||||
→ hermes container UP 2 minutes
|
||||
|
||||
✓ journalctl -u hermes.service
|
||||
→ [INFO] Hermes Agent started successfully
|
||||
|
||||
✓ Discord bot
|
||||
→ ONLINE ✓
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When to Seek Help
|
||||
|
||||
If after redeployment you still have issues:
|
||||
|
||||
1. **Check HERMES_DEBUGGING.md** in docs/ for detailed troubleshooting
|
||||
2. **Read HERMES_AUDIT_REPORT.md** for what was fixed
|
||||
3. **Run health check:** `/usr/local/bin/hermes-health-check.sh`
|
||||
4. **Share logs:** `docker logs hermes` output
|
||||
5. **Check config:** Verify Discord token, server ID, user IDs in `~/.hermes/.env`
|
||||
|
||||
---
|
||||
|
||||
## Redeploy Command
|
||||
|
||||
To apply all fixes:
|
||||
|
||||
```bash
|
||||
cd ~/openboatmobile
|
||||
|
||||
# Option 1: Clean slate (recommended)
|
||||
terraform destroy -auto-approve
|
||||
source .env && terraform init && terraform apply
|
||||
|
||||
# Option 2: Update in-place
|
||||
source .env && terraform apply -auto-approve
|
||||
```
|
||||
|
||||
Then verify with:
|
||||
```bash
|
||||
ssh hermes@<SERVER_IP>
|
||||
/usr/local/bin/hermes-health-check.sh
|
||||
```
|
||||
492
hermes/templates/userdata-hermes.tpl
Normal file
492
hermes/templates/userdata-hermes.tpl
Normal file
|
|
@ -0,0 +1,492 @@
|
|||
#cloud-config
|
||||
# Hermes Agent Bootstrap (Nous Research)
|
||||
|
||||
# Update packages
|
||||
package_update: true
|
||||
package_upgrade: true
|
||||
|
||||
# Install required packages
|
||||
packages:
|
||||
- curl
|
||||
- git
|
||||
- jq
|
||||
- gnupg
|
||||
- ca-certificates
|
||||
- software-properties-common
|
||||
%{ if docker_enabled ~}
|
||||
# Docker-specific packages
|
||||
%{ else ~}
|
||||
# Direct installation packages
|
||||
- python3
|
||||
- python3-pip
|
||||
- python3-venv
|
||||
- build-essential
|
||||
- libffi-dev
|
||||
- libssl-dev
|
||||
%{ endif ~}
|
||||
|
||||
# Create admin user (if different from root)
|
||||
users:
|
||||
- name: ${admin_user}
|
||||
sudo: ALL=(ALL) NOPASSWD:ALL
|
||||
shell: /bin/bash
|
||||
ssh_authorized_keys: ${jsonencode(admin_ssh_keys)}
|
||||
groups: [sudo, systemd-journal]
|
||||
|
||||
# Write system configuration files
|
||||
write_files:
|
||||
# Hermes environment file
|
||||
- path: /home/${admin_user}/.hermes/.env
|
||||
content: |
|
||||
# Hermes Agent Configuration - Generated by Terraform
|
||||
|
||||
# Inference API (Venice AI via OpenAI-compatible endpoint)
|
||||
# Venice API uses OPENAI_API_KEY + OPENAI_BASE_URL for custom endpoints
|
||||
OPENAI_API_KEY=${venice_api_key}
|
||||
OPENAI_BASE_URL=${venice_base_url}
|
||||
|
||||
# Discord Bot
|
||||
%{if discord_bot_token != ""}
|
||||
DISCORD_BOT_TOKEN=${discord_bot_token}
|
||||
%{endif}
|
||||
%{if discord_home_channel != ""}
|
||||
DISCORD_HOME_CHANNEL=${discord_home_channel}
|
||||
%{endif}
|
||||
%{if discord_allowed_users != ""}
|
||||
DISCORD_ALLOWED_USERS=${discord_allowed_users}
|
||||
%{endif}
|
||||
|
||||
# Brave Search
|
||||
%{if brave_search_api_key != ""}
|
||||
BRAVE_API_KEY=${brave_search_api_key}
|
||||
%{endif}
|
||||
|
||||
# Gateway Token
|
||||
HERMES_GATEWAY_TOKEN=${gateway_token}
|
||||
|
||||
# Authorization
|
||||
%{if gateway_allowed_users != ""}
|
||||
GATEWAY_ALLOWED_USERS=${gateway_allowed_users}
|
||||
%{endif}
|
||||
%{if gateway_allow_all_users}
|
||||
GATEWAY_ALLOW_ALL_USERS=true
|
||||
%{endif}
|
||||
permissions: '0600'
|
||||
|
||||
# Hermes config.yaml
|
||||
- path: /home/${admin_user}/.hermes/config.yaml
|
||||
content: |
|
||||
# Hermes Agent Configuration
|
||||
# Framework: Nous Research Hermes Agent
|
||||
# Venice AI via OpenAI-compatible endpoint
|
||||
|
||||
model:
|
||||
base_url: ${venice_base_url}
|
||||
model: ${primary_model}
|
||||
|
||||
auth:
|
||||
mode: allowlist
|
||||
|
||||
%{if discord_bot_token != ""}
|
||||
channels:
|
||||
discord:
|
||||
enabled: true
|
||||
auto_thread: ${discord_auto_thread}
|
||||
%{if discord_server_id != ""}
|
||||
guilds:
|
||||
"${discord_server_id}":
|
||||
require_mention: false
|
||||
%{if length(discord_user_id) > 0}
|
||||
users:
|
||||
%{ for id in discord_user_id ~}
|
||||
- "${id}"
|
||||
%{ endfor ~}
|
||||
%{endif}
|
||||
%{endif}
|
||||
%{endif}
|
||||
|
||||
# Configure auxiliary tasks to use Venice AI explicitly
|
||||
# This avoids "no auxiliary provider" warning
|
||||
auxiliary:
|
||||
compression:
|
||||
base_url: ${venice_base_url}
|
||||
api_key: ${venice_api_key}
|
||||
model: ${primary_model}
|
||||
|
||||
approvals:
|
||||
mode: smart
|
||||
|
||||
gateway:
|
||||
port: 18789
|
||||
bind: "0.0.0.0"
|
||||
permissions: '0644'
|
||||
|
||||
# SOUL.md - Agent personality
|
||||
- path: /home/${admin_user}/.hermes/SOUL.md
|
||||
content: |
|
||||
# SOUL.md - ${agent_name}
|
||||
|
||||
You are ${agent_name}, an AI agent running on the Hermes Agent framework from Nous Research.
|
||||
|
||||
## Identity
|
||||
|
||||
**Name:** ${agent_name}
|
||||
**Framework:** Hermes Agent (Nous Research)
|
||||
**Model:** ${primary_model_name}
|
||||
|
||||
## Behavior
|
||||
|
||||
- Be helpful and direct
|
||||
- Explain your reasoning clearly
|
||||
- Ask for clarification when needed
|
||||
- Follow security guardrails
|
||||
|
||||
## Notes
|
||||
|
||||
- Running on ${server_name}
|
||||
- Provider: Hetzner Cloud
|
||||
- Location: ${location}
|
||||
permissions: '0644'
|
||||
|
||||
%{ if docker_enabled ~}
|
||||
# Docker Compose for Hermes (Docker mode only)
|
||||
- path: /home/${admin_user}/docker-compose.yml
|
||||
content: |
|
||||
services:
|
||||
hermes:
|
||||
image: nousresearch/hermes-agent:latest
|
||||
container_name: ${agent_name}
|
||||
restart: unless-stopped
|
||||
command: gateway run
|
||||
volumes:
|
||||
- /home/${admin_user}/.hermes:/opt/data
|
||||
ports:
|
||||
- "18789:18789"
|
||||
env_file:
|
||||
- /home/${admin_user}/.hermes/.env
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G
|
||||
cpus: "2.0"
|
||||
permissions: '0644'
|
||||
%{ endif ~}
|
||||
|
||||
# Systemd service for Hermes
|
||||
- path: /etc/systemd/system/hermes.service
|
||||
content: |
|
||||
[Unit]
|
||||
Description=Hermes Agent Service
|
||||
%{ if docker_enabled ~}
|
||||
After=docker.service
|
||||
Requires=docker.service
|
||||
%{ else ~}
|
||||
After=network.target
|
||||
Wants=network-online.target
|
||||
%{ endif ~}
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
WorkingDirectory=/home/${admin_user}
|
||||
User=${admin_user}
|
||||
%{ if docker_enabled ~}
|
||||
ExecStartPre=/bin/bash -c 'sleep 5 && docker ps > /dev/null'
|
||||
ExecStart=/bin/sh -c 'cd /home/${admin_user} && exec docker compose -f docker-compose.yml up'
|
||||
ExecStop=/bin/sh -c 'cd /home/${admin_user} && exec docker compose -f docker-compose.yml down'
|
||||
%{ else ~}
|
||||
Environment=PATH=/home/${admin_user}/hermes-venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
||||
ExecStart=/usr/local/bin/hermes gateway run
|
||||
ExecStop=/bin/kill -TERM $MAINPID
|
||||
%{ endif ~}
|
||||
Restart=on-failure
|
||||
RestartSec=15
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=hermes
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
permissions: '0644'
|
||||
|
||||
# Health check and diagnostics script
|
||||
- path: /usr/local/bin/hermes-health-check.sh
|
||||
content: |
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
echo "=== Hermes Agent Health Check ==="
|
||||
echo ""
|
||||
%{ if docker_enabled ~}
|
||||
|
||||
# Docker-based checks
|
||||
# Check if Docker is running
|
||||
if systemctl is-active --quiet docker; then
|
||||
echo "✓ Docker daemon running"
|
||||
else
|
||||
echo "✗ Docker daemon not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if Hermes container exists
|
||||
if docker ps -a | grep -q "${agent_name}"; then
|
||||
echo "✓ Hermes container exists"
|
||||
else
|
||||
echo "✗ Hermes container not found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if Hermes container is running
|
||||
if docker ps | grep -q "${agent_name}"; then
|
||||
echo "✓ Hermes container running"
|
||||
CONTAINER_ID=$(docker ps -q -f name=${agent_name})
|
||||
UPTIME=$(docker inspect --format='{{.State.StartedAt}}' $CONTAINER_ID)
|
||||
echo " Started: $UPTIME"
|
||||
else
|
||||
echo "✗ Hermes container not running"
|
||||
echo " Last status:"
|
||||
docker ps -a --format "table {{.Names}}\t{{.Status}}" | grep ${agent_name}
|
||||
exit 1
|
||||
fi
|
||||
%{ else ~}
|
||||
|
||||
# Direct installation checks
|
||||
# Check if hermes binary exists
|
||||
if [ -x "/usr/local/bin/hermes" ]; then
|
||||
echo "✓ Hermes binary installed"
|
||||
else
|
||||
echo "✗ Hermes binary not found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if hermes venv exists
|
||||
if [ -d "/home/${admin_user}/hermes-venv" ]; then
|
||||
echo "✓ Hermes virtual environment exists"
|
||||
else
|
||||
echo "✗ Hermes virtual environment not found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if hermes process is running
|
||||
if pgrep -f "hermes gateway run" > /dev/null; then
|
||||
echo "✓ Hermes process running"
|
||||
HERMES_PID=$(pgrep -f "hermes gateway run")
|
||||
echo " PID: $HERMES_PID"
|
||||
else
|
||||
echo "✗ Hermes process not running"
|
||||
exit 1
|
||||
fi
|
||||
%{ endif ~}
|
||||
|
||||
# Check if port is listening
|
||||
if netstat -tlnp 2>/dev/null | grep -q ":18789 " || lsof -i :18789 > /dev/null 2>&1; then
|
||||
echo "✓ Gateway listening on port 18789"
|
||||
else
|
||||
echo "✗ Gateway not listening on port 18789"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if config files exist
|
||||
if [ -f /home/${admin_user}/.hermes/config.yaml ]; then
|
||||
echo "✓ config.yaml exists"
|
||||
else
|
||||
echo "✗ config.yaml missing"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -f /home/${admin_user}/.hermes/.env ]; then
|
||||
echo "✓ .env file exists"
|
||||
else
|
||||
echo "✗ .env file missing"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check systemd service
|
||||
if systemctl is-active --quiet hermes.service; then
|
||||
echo "✓ Hermes systemd service active"
|
||||
else
|
||||
echo "✗ Hermes systemd service not active"
|
||||
systemctl status hermes.service || true
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check recent logs
|
||||
echo ""
|
||||
echo "Recent logs:"
|
||||
%{ if docker_enabled ~}
|
||||
docker logs --tail=10 ${agent_name} 2>&1 | head -20 || echo " (No logs available)"
|
||||
%{ else ~}
|
||||
journalctl -u hermes.service -n 10 --no-pager || echo " (No logs available)"
|
||||
%{ endif ~}
|
||||
|
||||
# Check Discord configuration
|
||||
if grep -q "DISCORD_BOT_TOKEN" /home/${admin_user}/.hermes/.env; then
|
||||
if [ -s /home/${admin_user}/.hermes/.env ]; then
|
||||
BOT_TOKEN=$(grep "DISCORD_BOT_TOKEN" /home/${admin_user}/.hermes/.env | cut -d= -f2 | wc -c)
|
||||
echo ""
|
||||
echo "Discord configuration:"
|
||||
echo " Bot token configured: $([ $BOT_TOKEN -gt 10 ] && echo "✓ Yes" || echo "✗ No")"
|
||||
grep "DISCORD_SERVER_ID" /home/${admin_user}/.hermes/.env > /dev/null && echo " Server ID configured: ✓" || echo " Server ID configured: ✗"
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Health Check Complete ==="
|
||||
echo ""
|
||||
echo "For more details:"
|
||||
echo " systemctl status hermes.service"
|
||||
%{ if docker_enabled ~}
|
||||
echo " docker logs -f ${agent_name}"
|
||||
%{ else ~}
|
||||
echo " journalctl -u hermes.service -f"
|
||||
echo " hermes --help"
|
||||
%{ endif ~}
|
||||
echo ""
|
||||
permissions: '0755'
|
||||
|
||||
%{ if docker_enabled == false ~}
|
||||
# Direct installation script - avoids YAML escaping issues in runcmd
|
||||
- path: /usr/local/bin/install-hermes-direct.sh
|
||||
content: |
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
ADMIN_USER="${admin_user}"
|
||||
|
||||
echo "=== Installing Hermes Agent (Direct Mode) ==="
|
||||
|
||||
# Ensure home directory exists
|
||||
mkdir -p /home/$ADMIN_USER
|
||||
chown -R $ADMIN_USER:$ADMIN_USER /home/$ADMIN_USER
|
||||
chmod 755 /home/$ADMIN_USER
|
||||
|
||||
# Install dependencies
|
||||
apt-get update
|
||||
apt-get install -y git curl python3 python3-pip python3-venv build-essential libffi-dev libssl-dev
|
||||
|
||||
# Install uv (running as root during cloud-init)
|
||||
# Install uv system-wide so all users can access it
|
||||
UV_INSTALL_DIR=/usr/local/bin
|
||||
curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=$UV_INSTALL_DIR sh
|
||||
export PATH="$UV_INSTALL_DIR:$PATH"
|
||||
|
||||
# Clone Hermes Agent repository
|
||||
echo "Cloning Hermes Agent repository..."
|
||||
su - $ADMIN_USER -c "cd /home/$ADMIN_USER && git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git"
|
||||
|
||||
# Create virtual environment with Python 3.11
|
||||
echo "Creating Python 3.11 virtual environment..."
|
||||
su - $ADMIN_USER -c "cd /home/$ADMIN_USER/hermes-agent && /usr/local/bin/uv venv venv --python 3.11"
|
||||
|
||||
# Install Hermes with messaging extras
|
||||
echo "Installing Hermes Agent (this may take a few minutes)..."
|
||||
su - $ADMIN_USER -c "cd /home/$ADMIN_USER/hermes-agent && export VIRTUAL_ENV=/home/$ADMIN_USER/hermes-agent/venv && /usr/local/bin/uv pip install -e '.[messaging]'"
|
||||
|
||||
# Create hermes wrapper script
|
||||
echo "Creating wrapper script..."
|
||||
cat > /usr/local/bin/hermes << WRAPPER_EOF
|
||||
#!/bin/bash
|
||||
# Hermes wrapper script - uv is installed during cloud-init
|
||||
export PATH="/home/$ADMIN_USER/.local/bin:\$PATH"
|
||||
export VIRTUAL_ENV="/home/$ADMIN_USER/hermes-agent/venv"
|
||||
exec "/home/$ADMIN_USER/hermes-agent/venv/bin/hermes" "\$@"
|
||||
WRAPPER_EOF
|
||||
chmod +x /usr/local/bin/hermes
|
||||
|
||||
# Verify installation
|
||||
echo "Verifying installation..."
|
||||
/usr/local/bin/hermes version || {
|
||||
echo "ERROR: Hermes Agent installation failed"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Create config directory structure
|
||||
su - $ADMIN_USER -c "mkdir -p /home/$ADMIN_USER/.hermes/{cron,sessions,logs,memories,skills,pairing,hooks,image_cache,audio_cache}"
|
||||
chown -R $ADMIN_USER:$ADMIN_USER /home/$ADMIN_USER/.hermes
|
||||
chmod 755 /home/$ADMIN_USER/.hermes
|
||||
|
||||
echo "=== Installation Complete ==="
|
||||
permissions: '0755'
|
||||
%{ endif ~}
|
||||
|
||||
# Run commands
|
||||
runcmd:
|
||||
# Create directories
|
||||
- mkdir -p /home/${admin_user}/.hermes
|
||||
- chown -R ${admin_user}:${admin_user} /home/${admin_user}/.hermes
|
||||
%{ if docker_enabled ~}
|
||||
|
||||
# Docker-based installation
|
||||
- curl -fsSL https://get.docker.com | sh
|
||||
|
||||
# Install Docker Compose plugin (BEFORE pulling images)
|
||||
- apt-get update
|
||||
- apt-get install -y docker-compose-plugin
|
||||
|
||||
# Ensure home directory exists with correct ownership
|
||||
- mkdir -p /home/${admin_user}
|
||||
- chown -R ${admin_user}:${admin_user} /home/${admin_user}
|
||||
- chmod 755 /home/${admin_user}
|
||||
|
||||
# Add user to docker group for later use
|
||||
- usermod -aG docker ${admin_user}
|
||||
|
||||
# Wait for Docker daemon to be ready
|
||||
- sleep 5
|
||||
- docker ps > /dev/null || (sleep 10 && docker ps)
|
||||
|
||||
# Pull Hermes image (runs as root)
|
||||
- docker pull nousresearch/hermes-agent:latest
|
||||
|
||||
# Ensure .hermes directory has correct permissions for files written by docker
|
||||
- mkdir -p /home/${admin_user}/.hermes
|
||||
- chown -R ${admin_user}:${admin_user} /home/${admin_user}/.hermes
|
||||
- chmod 755 /home/${admin_user}/.hermes
|
||||
- chown ${admin_user}:${admin_user} /home/${admin_user}/docker-compose.yml
|
||||
- chmod 644 /home/${admin_user}/docker-compose.yml
|
||||
%{ else ~}
|
||||
|
||||
# Direct installation - call the install script
|
||||
- /usr/local/bin/install-hermes-direct.sh
|
||||
%{ endif ~}
|
||||
|
||||
# Enable and start Hermes service
|
||||
- systemctl daemon-reload
|
||||
- systemctl enable hermes.service
|
||||
|
||||
# Start the service with a slight delay to ensure all prerequisites are ready
|
||||
- sleep 2
|
||||
- systemctl start hermes.service
|
||||
- sleep 3
|
||||
|
||||
# Verify service started
|
||||
- systemctl is-active hermes.service || systemctl status hermes.service
|
||||
|
||||
# Print completion message
|
||||
- |
|
||||
echo ""
|
||||
echo "======================================="
|
||||
echo " Hermes Agent Bootstrap Complete!"
|
||||
echo "======================================="
|
||||
echo ""
|
||||
echo "Server: ${server_name}"
|
||||
echo "Framework: Hermes Agent (Nous Research)"
|
||||
echo "Model: ${primary_model}"
|
||||
%{ if docker_enabled ~}
|
||||
echo "Deployment: Docker Container"
|
||||
%{ else ~}
|
||||
echo "Deployment: Direct Installation"
|
||||
%{ endif ~}
|
||||
echo ""
|
||||
echo "Verify deployment:"
|
||||
echo " systemctl status hermes.service"
|
||||
%{ if docker_enabled ~}
|
||||
echo " docker ps"
|
||||
echo " docker logs ${agent_name}"
|
||||
%{ else ~}
|
||||
echo " hermes --version"
|
||||
echo " journalctl -u hermes.service -f"
|
||||
%{ endif ~}
|
||||
echo ""
|
||||
echo "For Discord connectivity:"
|
||||
echo " Check bot has 'online' status and is in your server"
|
||||
echo ""
|
||||
Loading…
Add table
Add a link
Reference in a new issue