Skip to main content

Platform + Snowflake

In platform mode with Snowflake, Embedd reads data from your Snowflake database and writes vectors back to your Snowflake account using the VECTOR(FLOAT, N) type. You can use Snowflake Cortex for embedding generation — no external embedding provider needed.

You'll need:

  • A Snowflake account with a warehouse, database, and schema accessible to Embedd
  • An Embedd API key
  • (Optional) An embedding provider API key if you prefer OpenAI, Gemini, or Voyage over Cortex

Step 1: Create a Connection

Register your Snowflake account so Embedd can read source rows and write vector tables.

Password authentication:

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "analytics-snowflake",
"mode": "platform",
"credentials": {
"account": "xy12345.us-east-1",
"user": "EMBEDD_USER",
"password": "your_password",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC"
}
}'

Key pair authentication:

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "analytics-snowflake",
"mode": "platform",
"credentials": {
"account": "xy12345.us-east-1",
"user": "EMBEDD_USER",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIE...\n-----END PRIVATE KEY-----",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC"
}
}'

Response:

{
"id": "conn_abc123",
"name": "analytics-snowflake",
"provider": "snowflake",
"mode": "platform",
"status": "created",
"created_at": "2026-03-13T10:00:00Z"
}
Platform mode permissions

Because Embedd creates and manages vector tables in your Snowflake account, the role needs more than just read access:

GRANT USAGE ON DATABASE ANALYTICS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;
GRANT SELECT ON TABLE ANALYTICS.PUBLIC.PRODUCTS TO ROLE EMBEDD_ROLE;
GRANT CREATE TABLE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;

CREATE TABLE is required because Embedd writes vector tables (and performs atomic table swaps during re-backfill) directly in your schema.


Step 2: Test the Connection

Verify that Embedd can reach your Snowflake account before proceeding.

curl -X POST https://api.embedd.to/v1/connections/conn_abc123/test \
-H "Authorization: Bearer sk_your_api_key"

Response:

{
"status": "ok",
"latency_ms": 185
}

If the test fails, check:

  • Account identifier — ensure the format is correct (e.g., xy12345.us-east-1).
  • Credentials — confirm the username and password (or private key) are correct.
  • Warehouse — verify the warehouse exists and is not suspended.
  • Network policy — if your Snowflake account uses network policies, allow Embedd's IPs.

Step 3: Embedding Provider (Optional)

This is the key difference in Snowflake platform mode: you can use Snowflake Cortex for embedding generation, which means no external provider is needed.

Option A: Use Snowflake Cortex

Skip this step entirely. When you create a vector table in Step 4, omit the embedding_provider_id field and use a Cortex-compatible model name. Cortex handles embedding generation natively inside your Snowflake account.

Available Cortex embedding models:

ModelDimensions
snowflake-arctic-embed-m-v1.5768
snowflake-arctic-embed-l-v2.01024

Option B: Use an External Provider

If you prefer OpenAI, Gemini, or Voyage, create an embedding provider the same way as in managed mode:

curl -X POST https://api.embedd.to/v1/embedding-providers \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "openai-prod",
"provider": "openai",
"api_key": "sk-proj-your-openai-key",
"default_model": "text-embedding-3-small"
}'

Response:

{
"id": "emb_xyz789",
"name": "openai-prod",
"provider": "openai",
"default_model": "text-embedding-3-small",
"created_at": "2026-03-13T10:01:00Z"
}

Use the returned id as the embedding_provider_id in Step 4.


Step 4: Create a Vector Table

A vector table maps source columns to embedding and metadata roles. In platform mode, Embedd creates the vector table directly in your Snowflake account.

Example A: Using Cortex (no external provider)

curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "products-search",
"connection_id": "conn_abc123",
"source_table": "ANALYTICS.PUBLIC.PRODUCTS",
"primary_key_column": "ID",
"embedding_model": "snowflake-arctic-embed-m-v1.5",
"embedding_dimensions": 768,
"mode": "platform",
"columns": [
{"name": "NAME", "role": "embedding", "ordinal": 1},
{"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
{"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
{"name": "PRICE", "role": "metadata", "filter_type": "float"},
{"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
]
}'

No embedding_provider_id — Cortex handles embeddings natively.

Example B: Using an External Provider

curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "products-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "ANALYTICS.PUBLIC.PRODUCTS",
"primary_key_column": "ID",
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
"mode": "platform",
"columns": [
{"name": "NAME", "role": "embedding", "ordinal": 1},
{"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
{"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
{"name": "PRICE", "role": "metadata", "filter_type": "float"},
{"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
]
}'

Response (both examples):

{
"id": "vt_abc123",
"name": "products-search",
"connection_id": "conn_abc123",
"source_table": "ANALYTICS.PUBLIC.PRODUCTS",
"mode": "platform",
"sync_status": "pending",
"embedding_model": "snowflake-arctic-embed-m-v1.5",
"embedding_dimensions": 768,
"columns": [
{"name": "NAME", "role": "embedding", "ordinal": 1},
{"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
{"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
{"name": "PRICE", "role": "metadata", "filter_type": "float"},
{"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
],
"created_at": "2026-03-13T10:02:00Z"
}

Embedd creates a table in your Snowflake schema with this structure:

CREATE TABLE EMBEDD_VT_XXXXXXXX_NAME (
PK_VALUE VARCHAR,
EMBEDDING VECTOR(FLOAT, 768),
EMBEDDED_TEXT VARCHAR,
METADATA VARIANT,
ROW_HASH VARCHAR,
PRIMARY KEY (PK_VALUE)
);

Metadata is stored as Snowflake VARIANT, so you can JOIN this table with other Snowflake tables and query metadata using standard Snowflake SQL.

No tier limits

Platform mode is not subject to tier limits. Since vectors are stored in your own Snowflake account, there are no max_tables or max_vectors restrictions.


Step 5: Trigger Backfill

Kick off the initial backfill to embed all existing rows from your source table.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/backfill \
-H "Authorization: Bearer sk_your_api_key"

Response:

{
"task_id": "task_def456",
"task_type": "backfill",
"target_id": "vt_abc123",
"status": "pending",
"created_at": "2026-03-13T10:03:00Z"
}

Check sync status to track progress:

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
-H "Authorization: Bearer sk_your_api_key"

Response (once complete):

{
"sync_status": "synced",
"synced_rows": 12450,
"total_rows": 12450,
"last_synced_at": "2026-03-13T10:08:00Z"
}

Step 6: Query

Run a semantic search with optional metadata filters.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "something warm for hiking",
"limit": 5,
"filters": {
"in_stock": {"eq": true},
"price": {"lte": 150}
}
}'

Response:

{
"results": [
{
"id": "4821",
"score": 0.892,
"metadata": {
"name": "Alpine Fleece Jacket",
"description": "Lightweight fleece jacket with wind-resistant outer layer",
"category": "outerwear",
"price": 129.99,
"in_stock": true
}
},
{
"id": "7733",
"score": 0.871,
"metadata": {
"name": "Merino Wool Base Layer",
"description": "Moisture-wicking merino wool top for cold-weather hiking",
"category": "base-layers",
"price": 89.00,
"in_stock": true
}
}
]
}

Filter operators use plain names like eq, lte, gte, ne — no $ prefix. See Filters for the full list of supported operators and types.


Step 7: Monitor Sync

After the initial backfill, Embedd automatically keeps vectors in sync with your source table. Inserts, updates, and deletes in Snowflake are detected and reflected in the vector table.

Check sync status:

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
-H "Authorization: Bearer sk_your_api_key"

Pause sync:

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/pause \
-H "Authorization: Bearer sk_your_api_key"

Resume sync:

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/resume \
-H "Authorization: Bearer sk_your_api_key"

Re-backfill and atomic swap

When a re-backfill is triggered (for example, after changing the embedding model), Embedd uses an atomic table swap to avoid downtime:

  1. A new swap table is created in your Snowflake schema.
  2. All rows are embedded and written to the swap table.
  3. The live table is renamed to _old, the swap table is renamed to the live name, and _old is dropped — all in a single transaction.

Queries continue hitting the live table throughout the process.

See Sync & Backfill for details on polling intervals and re-backfill behavior.


Key Differences from Managed Mode

AspectManagedPlatform (Snowflake)
Vector storageQdrant (hosted by Embedd)Your Snowflake account
Embedding providerRequiredOptional (Cortex available)
Tier limitsEnforcedNot enforced
SQL JOINs with vectorsNoYes
Metadata typeJSONVARIANT
Re-backfill strategyNew Qdrant collectionAtomic table swap