Steve Lawton c73e74386b feat(vertex): add Google Vertex AI provider for Gemini (OAuth2)

Adds Vertex AI as a first-class provider for Gemini models via Vertex's
OpenAI-compatible endpoint. Vertex authenticates with short-lived OAuth2
access tokens (service-account JSON or ADC), not a static API key — the
missing piece behind the recurring requests (#13484, #12639, #56259).

- agent/vertex_adapter.py: OAuth2 token minting + refresh-on-expiry
  (5-min margin), ADC->service-account fallback, global vs regional
  endpoint URLs. Config precedence: env var > config.yaml > default.
- plugins/model-providers/vertex/: provider profile (auth_type=vertex),
  reuses Gemini's extra_body.google.thinking_config translation.
- runtime_provider: vertex short-circuit BEFORE the credential pool so a
  credentials-file path is never mistaken for a static API key; mints a
  fresh token + computes base_url per resolve.
- run_agent + conversation_loop: _try_refresh_vertex_client_credentials()
  re-mints the token and rebuilds the client on a mid-session 401, so a
  long-lived gateway agent survives token expiry (~1h).
- auxiliary_client: vertex auth_type branch for side-LLM tasks.
- config.yaml: vertex.project_id / vertex.region (non-secret, bridged to
  env); credential path stays in .env (VERTEX_CREDENTIALS_PATH).
- setup wizard + model picker: dedicated _model_flow_vertex; curated
  google/gemini-* model list; --provider choices.
- pricing/metadata: Vertex prices off the gemini docs snapshot; endpoint
  host auto-maps to the vertex provider (no probe spam).
- lazy_deps + pyproject [vertex] extra: google-auth, opt-in only.
- docs: guides/google-vertex.md + providers page; tests for adapter +
  runtime resolution.

Salvages and modernizes #8427 by @slawt onto current main: rewired from
the legacy PROVIDER_REGISTRY path to the provider-profile architecture,
moved non-secret config out of .env into config.yaml, and added the
per-turn 401 token-refresh the original lacked.

2026-07-01 05:25:33 -07:00

6.5 KiB

Raw Blame History

sidebar_position	title	description
15	Google Vertex AI	Use Hermes Agent with Gemini on Google Cloud Vertex AI — OAuth2 service account or ADC, GCP billing and quotas, no static API key

Google Vertex AI

Hermes Agent supports Gemini models on Google Cloud Vertex AI through Vertex's OpenAI-compatible endpoint. Unlike the Google AI Studio provider (which uses a static API key against generativelanguage.googleapis.com), Vertex gives you enterprise-grade rate limits and GCP billing/credits, and is the right choice when you want Gemini usage to draw on your Google Cloud account rather than an AI Studio key.

:::info Vertex authenticates with OAuth2, not an API key Vertex has no static API key for the standard endpoint. Every request needs a short-lived OAuth2 access token (≈1 hour TTL) minted from either a service-account JSON or Application Default Credentials (ADC). Hermes mints and auto-refreshes these tokens for you — you never paste a token by hand. This is why pasting a temporary token into a custom provider's api_key field does not work: it expires mid-session. :::

Prerequisites

A Google Cloud project with the Vertex AI API enabled and billing active.
Credentials, one of:
- a service-account JSON key file with the roles/aiplatform.user role, or
- Application Default Credentials via gcloud auth application-default login (or the metadata server when running on a GCP VM).
google-auth — installed automatically the first time you select Vertex (lazy install), or explicitly with pip install 'hermes-agent[vertex]'.

Quick Start

# Option A — service account JSON (recommended for servers / gateways)
echo "VERTEX_CREDENTIALS_PATH=/path/to/service-account.json" >> ~/.hermes/.env

# Option B — Application Default Credentials (good for local dev)
gcloud auth application-default login

# Select Vertex as your provider
hermes model
# → Choose "More providers..." → "Google Vertex AI"
# → Enter your GCP project ID (or leave blank to use the one in your credentials)
# → Choose a region (default: global)
# → Select a Gemini model

# Start chatting
hermes chat

Configuration

Vertex splits its settings by sensitivity:

The credential path is a pointer to a secret and lives in ~/.hermes/.env.
Project ID and region are non-secret routing settings and live in ~/.hermes/config.yaml.

~/.hermes/.env:

# One of these (checked in this order); omit both to use ADC:
VERTEX_CREDENTIALS_PATH=/path/to/service-account.json
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

~/.hermes/config.yaml:

model:
  default: google/gemini-3-flash-preview
  provider: vertex

vertex:
  project_id: my-gcp-project   # blank → use the project embedded in the credentials
  region: global               # "global" is required for the Gemini 3.x previews

:::tip Environment variables win over config.yaml VERTEX_PROJECT_ID and VERTEX_REGION override the vertex.project_id / vertex.region values in config.yaml. Use them for per-shell overrides; keep the durable settings in config.yaml. :::

How authentication works

Hermes resolves credentials in this order: VERTEX_CREDENTIALS_PATH → GOOGLE_APPLICATION_CREDENTIALS → ADC.
It mints an OAuth2 access token (cloud-platform scope) and caches it, refreshing when the token is within 5 minutes of expiry.
The token is handed to a standard OpenAI client pointed at the Vertex endpoint:
```
https://aiplatform.googleapis.com/v1beta1/projects/{project}/locations/{region}/endpoints/openapi
```
Regional locations use a {region}-aiplatform.googleapis.com host instead.
If a session runs longer than the token lifetime and a request returns 401, Hermes re-mints the token and retries automatically. On a long-running gateway, if ADC's refresh token has itself expired, Hermes falls back to the service-account JSON when one is configured.

Available Models

Vertex requires the google/ vendor prefix on model IDs. The hermes model picker offers:

Model	ID
Gemini 3.1 Pro Preview	`google/gemini-3.1-pro-preview`
Gemini 3 Pro Preview	`google/gemini-3-pro-preview`
Gemini 3 Flash Preview	`google/gemini-3-flash-preview`
Gemini 3.1 Flash Lite Preview	`google/gemini-3.1-flash-lite-preview`
Gemini 2.5 Pro	`google/gemini-2.5-pro`
Gemini 2.5 Flash	`google/gemini-2.5-flash`

:::note global region for Gemini 3.x The Gemini 3.x preview models are served through the global endpoint. Regional endpoints (us-central1, etc.) may 404 them. Leave region: global unless you have a specific reason to pin a region. :::

Switching Models Mid-Session

/model google/gemini-3-pro-preview
/model google/gemini-3-flash-preview

/model switches among already-configured providers and models; it does not collect new credentials. Configure Vertex with hermes model first.

Reasoning / Thinking

Vertex exposes Gemini's thinking budget through the OpenAI-compatible surface. Hermes maps its reasoning-effort setting onto extra_body.google.thinking_config automatically, so reasoning_effort works the same way it does on other Gemini surfaces.

Diagnostics

hermes doctor

The doctor reports whether Vertex credentials can be resolved (service-account path or ADC) and whether the provider is configured.

Troubleshooting

"Vertex AI credentials could not be resolved"

Hermes found neither a service-account JSON nor working ADC. Either set VERTEX_CREDENTIALS_PATH in ~/.hermes/.env, or run gcloud auth application-default login. If your project isn't embedded in the credentials, set vertex.project_id in config.yaml.

`google-auth` not installed

Install the extra: pip install 'hermes-agent[vertex]'. Hermes also lazy-installs it the first time you select the Vertex provider.

404 on Gemini 3.x models

You are probably on a regional endpoint. Set region: global in the vertex: section of config.yaml (or unset VERTEX_REGION).

403 / permission denied

The service account (or your ADC identity) needs the roles/aiplatform.user role on the project, and the Vertex AI API must be enabled for that project.

Google Gemini (AI Studio) — static-API-key Gemini without GCP
AWS Bedrock — another native cloud-provider integration
AI Providers
Configuration

6.5 KiB Raw Blame History