Model Providers

Local vs cloud

Local

llama.cpp and LM Studio run models on your hardware. No API keys, no data leaves your machine. Requires downloading model files and enough RAM/VRAM.

Cloud

Bedrock, Gemini, and OpenRouter run models on remote infrastructure. Requires an API key. No local hardware requirements beyond the gateway itself.

Provider	Chat	Embeddings	Default URL
llama.cpp	Yes	Yes (separate server)	`http://127.0.0.1:8080/v1`
LM Studio	Yes	Yes	`http://localhost:1234/v1`

Provider

Chat

Embeddings

Default URL

llama.cpp

Yes

Yes (separate server)

http://127.0.0.1:8080/v1

LM Studio

Yes

http://localhost:1234/v1

Provider	Chat	Embeddings	Auth
AWS Bedrock	Yes	Yes (Titan / Nova)	Bearer token
Google Gemini	Yes	Yes	API key
OpenRouter	Yes	No	API key

Provider

Chat

Embeddings

Auth

AWS Bedrock

Yes

Yes (Titan / Nova)

Bearer token

Google Gemini

Yes

API key

OpenRouter

Yes

API key

The two-server pattern

A common local setup runs chat and embeddings on separate servers:

This works because chat and embeddings are independent subsystems. Configure them separately in Settings:

Settings > Chat — provider, base URL, model, API key

Settings > Memory — embedding provider, base URL, model, dimensions

The two-server pattern also works with cloud providers (e.g., llama.cpp for chat, Bedrock for embeddings) or any combination.

Hot-swapping

Chat provider changes (provider, model, base URL, system prompt) take effect immediately — no gateway restart needed.

Embedding provider changes require a restart and may invalidate existing vector memory if the model or dimensions change.

Changing your embedding model or dimensions will make previously stored vectors incompatible. You’ll need to rebuild your vector memory after switching.

Priority	Recommended
Privacy first, no cloud	llama.cpp or LM Studio
Best quality, cost is fine	Bedrock (Claude, Nova) or Gemini
Widest model selection	OpenRouter
Simple local setup	LM Studio (built-in model browser)
Maximum control	llama.cpp (direct llama-server flags)

Priority

Recommended

Privacy first, no cloud

llama.cpp or LM Studio

Best quality, cost is fine

Bedrock (Claude, Nova) or Gemini

Widest model selection

OpenRouter

Simple local setup

LM Studio (built-in model browser)

Maximum control

llama.cpp (direct llama-server flags)

Local vs cloud

Local

Cloud

Available providers

Local

Cloud

The two-server pattern

Hot-swapping

Choosing a provider

​Local vs cloud

Local

Cloud

​Available providers

​Local

​Cloud

​The two-server pattern

​Hot-swapping

​Choosing a provider

Local vs cloud

Available providers

Local

Cloud

The two-server pattern

Hot-swapping

Choosing a provider