Sanitized for public release: - Removed all API keys, tokens, and secrets - Removed personal Discord IDs from hermes-openclaw.json - Updated git URLs to be generic placeholders - All sensitive data uses environment variable interpolation
233 lines
4.8 KiB
Markdown
233 lines
4.8 KiB
Markdown
# Quick Reference: Hermes Deployment Status Check
|
|
|
|
## For Current Deployment (Before Fixes)
|
|
|
|
If you're still SSH'd into the server from your initial deployment, run these checks:
|
|
|
|
### Check 1: Is the systemd service running?
|
|
```bash
|
|
systemctl status hermes.service
|
|
```
|
|
**Expected (BROKEN - before fix):** Shows `failed` or `inactive`
|
|
|
|
### Check 2: Does the Docker container exist?
|
|
```bash
|
|
docker ps -a | grep hermes
|
|
```
|
|
**Expected (BROKEN - before fix):** Container doesn't exist OR shows `Exited` status
|
|
|
|
### Check 3: Check systemd journal for errors
|
|
```bash
|
|
journalctl -u hermes.service | tail -50
|
|
```
|
|
**Expected (BROKEN - before fix):** Error like "docker: command not found" or "file not found"
|
|
|
|
### Check 4: Watch docker logs
|
|
```bash
|
|
docker logs hermes 2>&1 | head -20
|
|
```
|
|
**Expected (BROKEN - before fix):** Either no container, or errors about missing files
|
|
|
|
### Check 5: Is Discord bot online?
|
|
```bash
|
|
# Go to Discord and check your server
|
|
# Look for the bot in members list
|
|
```
|
|
**Expected (BROKEN - before fix):** Shows `Offline` or doesn't appear
|
|
|
|
---
|
|
|
|
## After Redeploying with Fixes
|
|
|
|
Run these verification commands immediately after deployment:
|
|
|
|
### Quick Verification (< 1 minute)
|
|
```bash
|
|
# 1. Check service status
|
|
systemctl status hermes.service
|
|
|
|
# 2. Check Docker container
|
|
docker ps | grep hermes
|
|
|
|
# 3. Check port is listening
|
|
netstat -tlnp | grep 18789
|
|
```
|
|
|
|
**Expected (FIXED):**
|
|
- Service shows `active (running)`
|
|
- Container shows `UP` status
|
|
- Port 18789 shows `LISTEN`
|
|
|
|
### Comprehensive Health Check (< 5 minutes)
|
|
```bash
|
|
/usr/local/bin/hermes-health-check.sh
|
|
```
|
|
|
|
**Expected (FIXED):** All checks show ✓
|
|
|
|
### Detailed Logs
|
|
```bash
|
|
# Check what's happening in the container
|
|
docker logs -f hermes
|
|
|
|
# Use Ctrl+C to exit after 10-20 lines
|
|
```
|
|
|
|
**Expected (FIXED):**
|
|
```
|
|
[INFO] Hermes Agent Framework starting...
|
|
[INFO] Initializing gateway on port 18789
|
|
[INFO] Discord bot initialized
|
|
```
|
|
|
|
### Discord Connectivity Test
|
|
```bash
|
|
# In your Discord server, type:
|
|
@hermes help
|
|
|
|
# Bot should respond within 5 seconds
|
|
```
|
|
|
|
**Expected (FIXED):** Bot is online and responds
|
|
|
|
---
|
|
|
|
## Troubleshooting Matrix
|
|
|
|
| Symptom | Check | Fix |
|
|
|---------|-------|-----|
|
|
| Service shows `failed` | `journalctl -u hermes.service` | Redeploy with fixed template |
|
|
| Container `Exited` | `docker logs hermes` | Check the logs for errors |
|
|
| Port not listening | `docker ps` | Container not running |
|
|
| Docker permission denied | Check User= in service | Should be `root` now |
|
|
| Bot shows offline | Check Discord bot token | Verify in `.env` file |
|
|
| No container at all | `docker ps -a` | Image wasn't pulled, redeploy |
|
|
|
|
---
|
|
|
|
## Command Reference
|
|
|
|
### Systemd Service
|
|
```bash
|
|
# Check status
|
|
systemctl status hermes.service
|
|
|
|
# View logs (last 50 lines)
|
|
journalctl -u hermes.service -n 50
|
|
|
|
# View logs with timestamps
|
|
journalctl -u hermes.service -f --all
|
|
|
|
# Restart service
|
|
systemctl restart hermes.service
|
|
|
|
# Stop service
|
|
systemctl stop hermes.service
|
|
|
|
# Start service
|
|
systemctl start hermes.service
|
|
```
|
|
|
|
### Docker
|
|
```bash
|
|
# List running containers
|
|
docker ps
|
|
|
|
# List all containers (including stopped)
|
|
docker ps -a
|
|
|
|
# View container logs
|
|
docker logs hermes
|
|
|
|
# Follow logs (live)
|
|
docker logs -f hermes
|
|
|
|
# Show last 100 lines
|
|
docker logs --tail=100 hermes
|
|
|
|
# Inspect container
|
|
docker inspect hermes
|
|
```
|
|
|
|
### Files to Check
|
|
```bash
|
|
# Configuration files
|
|
cat ~/.hermes/.env
|
|
cat ~/.hermes/config.yaml
|
|
cat ~/docker-compose.yml
|
|
|
|
# Check permissions
|
|
ls -la ~/.hermes/
|
|
|
|
# Check if Hermes healthcheck script exists
|
|
ls -la /usr/local/bin/hermes-health-check.sh
|
|
```
|
|
|
|
---
|
|
|
|
## Before vs After Comparison
|
|
|
|
### BEFORE These Fixes:
|
|
```
|
|
❌ systemctl status hermes.service
|
|
→ inactive (dead)
|
|
|
|
❌ docker ps
|
|
→ (no container)
|
|
|
|
❌ journalctl -u hermes.service
|
|
→ cannot open: "/home/hermes/docker-compose.yml"
|
|
|
|
❌ Discord bot
|
|
→ OFFLINE
|
|
```
|
|
|
|
### AFTER These Fixes:
|
|
```
|
|
✓ systemctl status hermes.service
|
|
→ active (running)
|
|
|
|
✓ docker ps
|
|
→ hermes container UP 2 minutes
|
|
|
|
✓ journalctl -u hermes.service
|
|
→ [INFO] Hermes Agent started successfully
|
|
|
|
✓ Discord bot
|
|
→ ONLINE ✓
|
|
```
|
|
|
|
---
|
|
|
|
## When to Seek Help
|
|
|
|
If after redeployment you still have issues:
|
|
|
|
1. **Check HERMES_DEBUGGING.md** in docs/ for detailed troubleshooting
|
|
2. **Read HERMES_AUDIT_REPORT.md** for what was fixed
|
|
3. **Run health check:** `/usr/local/bin/hermes-health-check.sh`
|
|
4. **Share logs:** `docker logs hermes` output
|
|
5. **Check config:** Verify Discord token, server ID, user IDs in `~/.hermes/.env`
|
|
|
|
---
|
|
|
|
## Redeploy Command
|
|
|
|
To apply all fixes:
|
|
|
|
```bash
|
|
cd ~/openboatmobile
|
|
|
|
# Option 1: Clean slate (recommended)
|
|
terraform destroy -auto-approve
|
|
source .env && terraform init && terraform apply
|
|
|
|
# Option 2: Update in-place
|
|
source .env && terraform apply -auto-approve
|
|
```
|
|
|
|
Then verify with:
|
|
```bash
|
|
ssh hermes@<SERVER_IP>
|
|
/usr/local/bin/hermes-health-check.sh
|
|
```
|