April 2026 · Jezza Hehn
[DRAFT — pending Jez's review and edits before publication]
"Just self-host your AI" is the privacy community's answer to everything. It's not wrong, but it's incomplete. I've been running my entire AI agent infrastructure on a VPS for three months. Here's what I actually learned about the tradeoffs.
Prompt privacy. When I send a prompt to a local model, that prompt stays on my hardware. No API call, no third-party logging, no training data contribution. For a community manager dealing with user reports, that matters. I don't want user complaints being processed through OpenAI's servers.
Control over the model. I can swap models without changing my workflow. Currently using Venice AI for inference because running a capable model locally would require hardware I can't afford. But the architecture is designed so I could switch to local inference (Ollama, llama.cpp) without rewriting anything. The agent runtime doesn't care where the model lives.
No usage restrictions. Commercial API providers have content policies. Some tasks I need (security scanning, content analysis for moderation) might trip content filters on managed APIs. Self-hosted means I decide what's acceptable.
Cost predictability. API costs scale with usage and they scale fast. A busy week of agent work can burn through tokens that add up. With a self-hosted model, the cost is the hardware, period. Once you've paid for the GPU, inference is effectively free.
It's not automatically more secure. A self-hosted model on an unpatched VPS with default credentials is less private than a well-managed API. Security is a stack: firewalls, SSH keys, updates, access controls. The model hosting method is one layer, not the whole picture.
Quality takes hardware. The best open-source models (Llama 3, Qwen, DeepSeek) run acceptably on consumer GPUs but need significant VRAM for good performance at usable speeds. I'm currently using API inference because my VPS doesn't have a GPU, and running a competent model on CPU is too slow for interactive use. "Self-hosted" and "high quality" are sometimes in tension.
You still need internet for most tasks. My agents do web research, check email, interact with APIs. The model itself could run offline, but the agent's capabilities depend on connectivity. "Air-gapped AI" is possible for specific use cases (document processing, local code analysis) but not for the kind of general-purpose agent work I do.
Maintenance is real work. Updates, dependency management, monitoring, backups. If your self-hosted setup goes down at 2 AM, there's no support team to call. That's you.
I run OpenClaw on a $24/month DigitalOcean droplet (4 vCPU, 8GB RAM). Inference comes from Venice AI's API because I can't fit a good model in 8GB. My data flow looks like this:
Is this perfect privacy? No. Venice could theoretically log inference requests. But it's a significant improvement over sending everything through OpenAI, Google, or Anthropic's consumer-facing services. And the architecture is ready for full local inference whenever I can afford the hardware.
If you're a small business considering self-hosted AI, start with API-based inference and self-hosted agent infrastructure. You get most of the control benefits without the hardware cost. Migrate to local inference when your budget allows and your use case demands it. The agent framework (OpenClaw, LangChain, whatever you choose) abstracts the model layer enough that switching is straightforward.
Don't self-host just because someone on a forum told you to. Self-host because you have a specific privacy or compliance requirement that managed APIs can't meet. Know what you're getting, and more importantly, know what you're not.