Platform + Snowflake

In platform mode with Snowflake, Embedd reads data from your Snowflake database and writes vectors back to your Snowflake account using the VECTOR(FLOAT, N) type. You can use Snowflake Cortex for embedding generation — no external embedding provider needed.

You'll need:

A Snowflake account with a warehouse, database, and schema accessible to Embedd
An Embedd API key
(Optional) An embedding provider API key if you prefer OpenAI, Gemini, or Voyage over Cortex

Step 1: Create a Connection

Password authentication:

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analytics-snowflake",
    "mode": "platform",
    "credentials": {
      "account": "xy12345.us-east-1",
      "user": "EMBEDD_USER",
      "password": "your_password",
      "warehouse": "COMPUTE_WH",
      "database": "ANALYTICS",
      "schema": "PUBLIC"
    }
  }'

Key pair authentication:

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analytics-snowflake",
    "mode": "platform",
    "credentials": {
      "account": "xy12345.us-east-1",
      "user": "EMBEDD_USER",
      "private_key": "-----BEGIN PRIVATE KEY-----\nMIIE...\n-----END PRIVATE KEY-----",
      "warehouse": "COMPUTE_WH",
      "database": "ANALYTICS",
      "schema": "PUBLIC"
    }
  }'

Response:

{
  "id": "conn_abc123",
  "name": "analytics-snowflake",
  "provider": "snowflake",
  "mode": "platform",
  "status": "created",
  "created_at": "2026-03-13T10:00:00Z"
}

Platform mode permissions

Because Embedd creates and manages vector tables in your Snowflake account, the role needs more than just read access:

GRANT USAGE ON DATABASE ANALYTICS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;
GRANT SELECT ON TABLE ANALYTICS.PUBLIC.PRODUCTS TO ROLE EMBEDD_ROLE;
GRANT CREATE TABLE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;

CREATE TABLE is required because Embedd writes vector tables (and performs atomic table swaps during re-backfill) directly in your schema.

Step 2: Test the Connection

Verify that Embedd can reach your Snowflake account before proceeding.

curl -X POST https://api.embedd.to/v1/connections/conn_abc123/test \
  -H "Authorization: Bearer sk_your_api_key"

Response:

{
  "status": "ok",
  "latency_ms": 185
}

If the test fails, check:

Account identifier — ensure the format is correct (e.g., xy12345.us-east-1).
Credentials — confirm the username and password (or private key) are correct.
Warehouse — verify the warehouse exists and is not suspended.
Network policy — if your Snowflake account uses network policies, allow Embedd's IPs.

Step 3: Embedding Provider (Optional)

This is the key difference in Snowflake platform mode: you can use Snowflake Cortex for embedding generation, which means no external provider is needed.

Option A: Use Snowflake Cortex

Skip this step entirely. When you create a vector table in Step 4, omit the embedding_provider_id field and use a Cortex-compatible model name. Cortex handles embedding generation natively inside your Snowflake account.

Available Cortex embedding models:

Model	Dimensions
snowflake-arctic-embed-m-v1.5	768
snowflake-arctic-embed-l-v2.0	1024

Option B: Use an External Provider

If you prefer OpenAI, Gemini, or Voyage, create an embedding provider the same way as in managed mode:

curl -X POST https://api.embedd.to/v1/embedding-providers \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai-prod",
    "provider": "openai",
    "api_key": "sk-proj-your-openai-key",
    "default_model": "text-embedding-3-small"
  }'

Response:

{
  "id": "emb_xyz789",
  "name": "openai-prod",
  "provider": "openai",
  "default_model": "text-embedding-3-small",
  "created_at": "2026-03-13T10:01:00Z"
}

Use the returned id as the embedding_provider_id in Step 4.

Step 4: Create a Vector Table

A vector table maps source columns to embedding and metadata roles. In platform mode, Embedd creates the vector table directly in your Snowflake account.

Example A: Using Cortex (no external provider)

curl -X POST https://api.embedd.to/v1/vector-tables \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "products-search",
    "connection_id": "conn_abc123",
    "source_table": "ANALYTICS.PUBLIC.PRODUCTS",
    "primary_key_column": "ID",
    "embedding_model": "snowflake-arctic-embed-m-v1.5",
    "embedding_dimensions": 768,
    "mode": "platform",
    "columns": [
      {"name": "NAME", "role": "embedding", "ordinal": 1},
      {"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
      {"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
      {"name": "PRICE", "role": "metadata", "filter_type": "float"},
      {"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
    ]
  }'

No embedding_provider_id — Cortex handles embeddings natively.

Example B: Using an External Provider

curl -X POST https://api.embedd.to/v1/vector-tables \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "products-search",
    "connection_id": "conn_abc123",
    "embedding_provider_id": "emb_xyz789",
    "source_table": "ANALYTICS.PUBLIC.PRODUCTS",
    "primary_key_column": "ID",
    "embedding_model": "text-embedding-3-small",
    "embedding_dimensions": 1536,
    "mode": "platform",
    "columns": [
      {"name": "NAME", "role": "embedding", "ordinal": 1},
      {"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
      {"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
      {"name": "PRICE", "role": "metadata", "filter_type": "float"},
      {"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
    ]
  }'

Response (both examples):

{
  "id": "vt_abc123",
  "name": "products-search",
  "connection_id": "conn_abc123",
  "source_table": "ANALYTICS.PUBLIC.PRODUCTS",
  "mode": "platform",
  "sync_status": "pending",
  "embedding_model": "snowflake-arctic-embed-m-v1.5",
  "embedding_dimensions": 768,
  "columns": [
    {"name": "NAME", "role": "embedding", "ordinal": 1},
    {"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
    {"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
    {"name": "PRICE", "role": "metadata", "filter_type": "float"},
    {"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
  ],
  "created_at": "2026-03-13T10:02:00Z"
}

Embedd creates a table in your Snowflake schema with this structure:

CREATE TABLE EMBEDD_VT_XXXXXXXX_NAME (
    PK_VALUE VARCHAR,
    EMBEDDING VECTOR(FLOAT, 768),
    EMBEDDED_TEXT VARCHAR,
    METADATA VARIANT,
    ROW_HASH VARCHAR,
    PRIMARY KEY (PK_VALUE)
);

Metadata is stored as Snowflake VARIANT, so you can JOIN this table with other Snowflake tables and query metadata using standard Snowflake SQL.

No tier limits

Platform mode is not subject to tier limits. Since vectors are stored in your own Snowflake account, there are no max_tables or max_vectors restrictions.

Step 5: Trigger Backfill

Kick off the initial backfill to embed all existing rows from your source table.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/backfill \
  -H "Authorization: Bearer sk_your_api_key"

Response:

{
  "task_id": "task_def456",
  "task_type": "backfill",
  "target_id": "vt_abc123",
  "status": "pending",
  "created_at": "2026-03-13T10:03:00Z"
}

Check sync status to track progress:

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
  -H "Authorization: Bearer sk_your_api_key"

Response (once complete):

{
  "sync_status": "synced",
  "synced_rows": 12450,
  "total_rows": 12450,
  "last_synced_at": "2026-03-13T10:08:00Z"
}

Step 6: Query

Run a semantic search with optional metadata filters.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/query \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "something warm for hiking",
    "limit": 5,
    "filters": {
      "in_stock": {"eq": true},
      "price": {"lte": 150}
    }
  }'

Response:

{
  "results": [
    {
      "id": "4821",
      "score": 0.892,
      "metadata": {
        "name": "Alpine Fleece Jacket",
        "description": "Lightweight fleece jacket with wind-resistant outer layer",
        "category": "outerwear",
        "price": 129.99,
        "in_stock": true
      }
    },
    {
      "id": "7733",
      "score": 0.871,
      "metadata": {
        "name": "Merino Wool Base Layer",
        "description": "Moisture-wicking merino wool top for cold-weather hiking",
        "category": "base-layers",
        "price": 89.00,
        "in_stock": true
      }
    }
  ]
}

Filter operators use plain names like eq, lte, gte, ne — no $ prefix. See Filters for the full list of supported operators and types.

Step 7: Monitor Sync

After the initial backfill, Embedd automatically keeps vectors in sync with your source table. Inserts, updates, and deletes in Snowflake are detected and reflected in the vector table.

Check sync status:

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
  -H "Authorization: Bearer sk_your_api_key"

Pause sync:

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/pause \
  -H "Authorization: Bearer sk_your_api_key"

Resume sync:

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/resume \
  -H "Authorization: Bearer sk_your_api_key"

Re-backfill and atomic swap

When a re-backfill is triggered (for example, after changing the embedding model), Embedd uses an atomic table swap to avoid downtime:

A new swap table is created in your Snowflake schema.
All rows are embedded and written to the swap table.
The live table is renamed to _old, the swap table is renamed to the live name, and _old is dropped — all in a single transaction.

Queries continue hitting the live table throughout the process.

See Sync & Backfill for details on polling intervals and re-backfill behavior.

Key Differences from Managed Mode

Aspect	Managed	Platform (Snowflake)
Vector storage	Qdrant (hosted by Embedd)	Your Snowflake account
Embedding provider	Required	Optional (Cortex available)
Tier limits	Enforced	Not enforced
SQL JOINs with vectors	No	Yes
Metadata type	JSON	VARIANT
Re-backfill strategy	New Qdrant collection	Atomic table swap

Filters — full filter operator reference
Sync & Backfill — how automatic sync and re-backfill work
Vector Tables API — complete vector table endpoints
Query API — query parameters and response format

Step 1: Create a Connection​

Step 2: Test the Connection​

Step 3: Embedding Provider (Optional)​

Option A: Use Snowflake Cortex​

Option B: Use an External Provider​

Step 4: Create a Vector Table​

Example A: Using Cortex (no external provider)​

Example B: Using an External Provider​

Step 5: Trigger Backfill​

Step 6: Query​

Step 7: Monitor Sync​

Re-backfill and atomic swap​

Key Differences from Managed Mode​

Related​

Step 1: Create a Connection

Step 2: Test the Connection

Step 3: Embedding Provider (Optional)

Option A: Use Snowflake Cortex

Option B: Use an External Provider

Step 4: Create a Vector Table

Example A: Using Cortex (no external provider)

Example B: Using an External Provider

Step 5: Trigger Backfill

Step 6: Query

Step 7: Monitor Sync

Re-backfill and atomic swap

Key Differences from Managed Mode

Related