Skip to main content

Platform + PostgreSQL

In platform mode with PostgreSQL, Embedd reads data from your PostgreSQL database, generates embeddings via your chosen provider, and writes vectors back to your PostgreSQL database using pgvector. Vectors live in your infrastructure — Embedd never stores them.

You'll need:

  • PostgreSQL 14+ with the pgvector extension installed
  • The vector extension created in the target schema: CREATE EXTENSION IF NOT EXISTS vector;
  • An embedding provider API key (OpenAI, Gemini, or Voyage) — unlike Snowflake platform mode, PostgreSQL has no native embedding engine
  • An Embedd API key

Step 1: Create a Connection

Register your PostgreSQL database in platform mode. This tells Embedd where to read source rows and where to write vectors.

curl -X POST https://api.embedd.to/v1/providers/postgresql/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "product-db",
"mode": "platform",
"credentials": {
"host": "your-db-host.com",
"port": 5432,
"database": "myapp",
"user": "embedd_user",
"password": "your_password"
}
}'

Response:

{
"id": "conn_abc123",
"name": "product-db",
"provider": "postgresql",
"mode": "platform",
"status": "created",
"created_at": "2026-03-13T10:00:00Z"
}
Permissions

Platform mode needs more than read access. The user must be able to create tables and indexes for vector storage:

GRANT SELECT ON TABLE public.products TO embedd_user;
GRANT CREATE ON SCHEMA public TO embedd_user;

Step 2: Test the Connection

Verify that Embedd can reach your database before proceeding.

curl -X POST https://api.embedd.to/v1/connections/conn_abc123/test \
-H "Authorization: Bearer sk_your_api_key"

Response:

{
"status": "ok",
"latency_ms": 42
}

If the test fails, check:

  • Firewall rules — ensure Embedd's IPs can reach your database host and port.
  • Credentials — confirm the username, password, and database name are correct.
  • SSL — if your database requires SSL, set ssl_mode to require in the connection credentials.
  • pgvector — confirm the extension is installed: SELECT * FROM pg_extension WHERE extname = 'vector';

Step 3: Configure an Embedding Provider

PostgreSQL has no native embedding engine, so an embedding provider is required for platform mode. Tell Embedd which provider and model to use.

curl -X POST https://api.embedd.to/v1/embedding-providers \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "openai-prod",
"provider": "openai",
"api_key": "sk-proj-your-openai-key",
"default_model": "text-embedding-3-small"
}'

Response:

{
"id": "emb_xyz789",
"name": "openai-prod",
"provider": "openai",
"default_model": "text-embedding-3-small",
"created_at": "2026-03-13T10:01:00Z"
}

Available providers and models

ProviderModelDimensions
openaitext-embedding-3-small1536
openaitext-embedding-3-large3072
openaitext-embedding-ada-0021536
geminigemini-embedding-0013072
geminitext-embedding-005768
geminitext-embedding-004768
voyagevoyage-3.51024
voyagevoyage-3.5-lite512
voyagevoyage-code-31024
voyagevoyage-3-large1024
voyagevoyage-3-lite512

Step 4: Create a Vector Table

A vector table maps source columns to embedding and metadata roles. In platform mode, Embedd creates a physical table in your PostgreSQL database to store the vectors.

curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "products-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "public.products",
"primary_key_column": "id",
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
"mode": "platform",
"columns": [
{"name": "name", "role": "embedding", "ordinal": 1, "name_prefix": "Product: "},
{"name": "description", "role": "embedding", "ordinal": 2, "name_prefix": "Description: "},
{"name": "category", "role": "metadata", "filter_type": "keyword"},
{"name": "price", "role": "metadata", "filter_type": "float"},
{"name": "in_stock", "role": "metadata", "filter_type": "boolean"}
]
}'

Response:

{
"id": "vt_abc123",
"name": "products-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "public.products",
"mode": "platform",
"sync_status": "pending",
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
"platform_vector_ref": "embedd_vt_a1b2c3d4_products_search",
"columns": [
{"name": "name", "role": "embedding", "ordinal": 1, "name_prefix": "Product: "},
{"name": "description", "role": "embedding", "ordinal": 2, "name_prefix": "Description: "},
{"name": "category", "role": "metadata", "filter_type": "keyword"},
{"name": "price", "role": "metadata", "filter_type": "float"},
{"name": "in_stock", "role": "metadata", "filter_type": "boolean"}
],
"created_at": "2026-03-13T10:02:00Z"
}

Embedd creates a table in your database with this schema:

CREATE TABLE embedd_vt_a1b2c3d4_products_search (
pk_value TEXT PRIMARY KEY,
embedding vector(1536),
embedded_text TEXT,
metadata JSONB DEFAULT '{}',
row_hash TEXT
);
CREATE INDEX ON embedd_vt_a1b2c3d4_products_search USING hnsw (embedding vector_cosine_ops);

The embedding_provider_id is required for PostgreSQL platform mode. Without it, the request will fail.

No tier limits

Platform mode is not subject to max_tables or max_vectors tier limits. Vectors live in your database, so usage is bounded only by your own infrastructure.


Step 5: Trigger Backfill

Kick off the initial backfill to read all source rows, generate embeddings, and write vectors to your PostgreSQL database.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/backfill \
-H "Authorization: Bearer sk_your_api_key"

Response:

{
"task_id": "task_def456",
"task_type": "backfill",
"target_id": "vt_abc123",
"status": "pending",
"created_at": "2026-03-13T10:03:00Z"
}

Check sync status to track progress:

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
-H "Authorization: Bearer sk_your_api_key"

Response (once complete):

{
"sync_status": "synced",
"synced_rows": 12450,
"total_rows": 12450,
"last_synced_at": "2026-03-13T10:08:00Z"
}

When synced_rows matches total_rows, all your data has been embedded and is ready to query.


Step 6: Query

Run a semantic search with optional metadata filters.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "something warm for hiking",
"limit": 5,
"filters": {
"in_stock": {"eq": true},
"price": {"lte": 150}
}
}'

Response:

{
"results": [
{
"id": "4821",
"score": 0.892,
"metadata": {
"name": "Alpine Fleece Jacket",
"description": "Lightweight fleece jacket with wind-resistant outer layer",
"category": "outerwear",
"price": 129.99,
"in_stock": true
}
},
{
"id": "7733",
"score": 0.871,
"metadata": {
"name": "Merino Wool Base Layer",
"description": "Moisture-wicking merino wool top for cold-weather hiking",
"category": "base-layers",
"price": 89.00,
"in_stock": true
}
}
]
}

Filter operators use plain names like eq, lte, gte, ne — no $ prefix. See Filters for the full list of supported operators and types.

Embedd automatically sets hnsw.iterative_scan = on for PostgreSQL platform queries. This ensures consistent results when combining vector similarity with metadata filters, preventing cases where the HNSW index would otherwise return too few candidates before filtering.


Step 7: Monitor Sync

After the initial backfill, Embedd automatically keeps vectors in sync with your source table. Inserts, updates, and deletes in your source PostgreSQL table are detected and reflected in the vector table.

Check sync status:

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
-H "Authorization: Bearer sk_your_api_key"

Pause sync:

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/pause \
-H "Authorization: Bearer sk_your_api_key"

Resume sync:

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/resume \
-H "Authorization: Bearer sk_your_api_key"

Re-backfill and atomic swap

If you change the embedding model, dimensions, or column configuration, the vector table enters pending_rebackfill status. The next backfill performs an atomic table swap to replace vectors with zero downtime:

  1. Embedd creates a new swap table (e.g., embedd_vt_a1b2c3d4_products_search_swap)
  2. All rows are re-embedded and written to the swap table
  3. The live table is renamed to _old, and the swap table is renamed to the live name
  4. The _old table is dropped

Your queries continue to hit the live table throughout this process — there is no window where data is unavailable.

See Sync & Backfill for details on how sync works, polling intervals, and re-backfill behavior.


Key Differences from Managed Mode

AspectManagedPlatform (PostgreSQL)
Vector storageQdrant (hosted by Embedd)Your PostgreSQL database
Embedding providerRequiredRequired
Tier limitsEnforced (max_tables, max_vectors)Not enforced
SQL JOINs with vectorsNoYes — vectors are a regular table in your database
pgvector requiredNoYes
Re-backfill strategyNew Qdrant collectionAtomic table swap

Because vectors live in your PostgreSQL database, you can join the vector table directly with your application tables for hybrid queries — something that is not possible in managed mode.