Managed + Snowflake

Embedd reads data from your Snowflake database, generates embeddings via your chosen provider, and stores vectors in Embedd's managed Qdrant instance. This guide walks through the full setup end to end.

What you need:

A Snowflake account with a table you want to make searchable
An API key from an embedding provider (OpenAI, Gemini, or Voyage)
An Embedd API key

Step 1: Create a Connection

Password Authentication

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analytics-warehouse",
    "mode": "managed",
    "credentials": {
      "auth_method": "password",
      "account": "myorg-account",
      "user": "EMBEDD_USER",
      "password": "secure_password",
      "warehouse": "COMPUTE_WH",
      "database": "ANALYTICS",
      "schema": "PUBLIC",
      "role": "EMBEDD_ROLE"
    }
  }'

Key Pair Authentication

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analytics-warehouse",
    "mode": "managed",
    "credentials": {
      "auth_method": "key_pair",
      "account": "myorg-account",
      "user": "EMBEDD_USER",
      "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEv...\n-----END PRIVATE KEY-----",
      "warehouse": "COMPUTE_WH",
      "database": "ANALYTICS",
      "schema": "PUBLIC",
      "role": "EMBEDD_ROLE"
    }
  }'

Required Snowflake Permissions

The role you provide must have SELECT access on the source table. Run these grants in Snowflake as an admin:

CREATE ROLE IF NOT EXISTS EMBEDD_ROLE;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;
GRANT SELECT ON TABLE ANALYTICS.PUBLIC.PRODUCTS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE EMBEDD_ROLE;
GRANT ROLE EMBEDD_ROLE TO USER EMBEDD_USER;

Save the id from the response (e.g., conn_abc123) -- you will need it in Step 4.

Step 2: Test the Connection

Verify that Embedd can reach your Snowflake account with the stored credentials.

curl -X POST https://api.embedd.to/v1/connections/conn_abc123/test \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123"

A successful response looks like:

{
  "status": "connected",
  "details": {"latency_ms": 245}
}

If the test fails, check that your account identifier, credentials, and role grants are correct.

Step 3: Configure an Embedding Provider

curl -X POST https://api.embedd.to/v1/embedding-providers \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai-prod",
    "provider": "openai",
    "api_key": "sk-proj-...",
    "default_model": "text-embedding-3-small"
  }'

Save the id from the response (e.g., ep_abc123).

Available Models

Provider	Model	Dimensions
`openai`	`text-embedding-3-small`	1536
`openai`	`text-embedding-3-large`	3072
`openai`	`text-embedding-ada-002`	1536
`gemini`	`text-embedding-004`	768
`gemini`	`gemini-embedding-001`	3072
`gemini`	`gemini-embedding-exp-03`	3072
`voyage`	`voyage-3`	1024
`voyage`	`voyage-3-lite`	512
`voyage`	`voyage-code-3`	1024
`voyage`	`voyage-finance-2`	1024
`voyage`	`voyage-law-2`	1024

Step 4: Create a Vector Table

Define which source table to embed, which columns to include, and how to map them.

Snowflake column names are uppercase by default. Use the exact casing that matches your source table.

curl -X POST https://api.embedd.to/v1/vector-tables \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "product-search",
    "connection_id": "conn_abc123",
    "embedding_provider_id": "ep_abc123",
    "source_table": "PRODUCTS",
    "primary_key_column": "ID",
    "columns": [
      {"name": "NAME", "role": "embedding", "ordinal": 1, "name_prefix": "name: "},
      {"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
      {"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
      {"name": "PRICE", "role": "metadata", "filter_type": "float"},
      {"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
    ],
    "embedding_model": "text-embedding-3-small",
    "embedding_dimensions": 1536,
    "mode": "managed"
  }'

Column roles:

embedding -- text columns whose content is concatenated and embedded. Use ordinal to control the order and name_prefix to add context (e.g., "name: " prepended to the value).
metadata -- columns stored alongside vectors for filtering at query time. Each metadata column requires a filter_type: keyword, integer, float, or boolean.

Save the id from the response (e.g., vt_abc123).

Step 5: Trigger Backfill

Start the initial backfill to read all rows from your Snowflake table, generate embeddings, and store vectors in Qdrant.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/backfill \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123"

{
  "vector_table_id": "vt_abc123",
  "status": "backfilling",
  "total_rows": 10000,
  "synced_rows": 0
}

The backfill runs asynchronously. For large tables this may take several minutes. You can check progress with the sync status endpoint (Step 7).

Step 6: Query

Once the backfill completes, run a semantic search against your vectors.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/query \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "affordable wireless headphones",
    "limit": 5,
    "filters": {
      "CATEGORY": {"eq": "electronics"},
      "PRICE": {"lte": 50},
      "IN_STOCK": {"eq": true}
    }
  }'

{
  "results": [
    {
      "id": 42,
      "similarity_score": 0.91,
      "metadata": {
        "CATEGORY": "electronics",
        "PRICE": 29.99,
        "IN_STOCK": true
      },
      "embedded_text": "name: Wireless Bluetooth Headphones Lightweight over-ear headphones with 30-hour battery life..."
    }
  ],
  "query": "affordable wireless headphones",
  "model": "text-embedding-3-small",
  "total_results": 1
}

Filter Operators

Filters use operators without a $ prefix:

Operator	Description
`eq`	Equals
`ne`	Not equals
`gt`	Greater than
`gte`	Greater than or equal
`lt`	Less than
`lte`	Less than or equal
`in`	In array
`nin`	Not in array
`exists`	Field exists

Multiple filters are combined with AND logic.

Step 7: Monitor Sync

After the initial backfill, Embedd continuously syncs changes from your Snowflake table. Check the current sync status at any time.

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123"

{
  "vector_table_id": "vt_abc123",
  "sync_status": "synced",
  "last_sync_at": "2025-01-15T10:30:00Z",
  "rows_pending": 0,
  "staleness_secs": 120,
  "last_error": null,
  "last_error_at": null
}

Sync Statuses

Status	Meaning
`pending`	Created but no backfill run yet
`backfilling`	Initial backfill in progress
`synced`	Up to date and actively syncing
`paused`	Sync manually paused
`error`	Last sync failed (check `last_error`)
`pending_rebackfill`	Configuration changed, needs re-backfill

You can pause and resume sync as needed:

# Pause
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/pause \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123"

# Resume
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/resume \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "X-Environment-Id: env_abc123"

Connections API Reference -- full connection options and update/delete operations
Embedding Providers API Reference -- manage providers and list models
Vector Tables API Reference -- column configuration, schema introspection, re-backfill
Query API Reference -- advanced filtering and similarity thresholds
Sync API Reference -- pause, resume, and status details

Step 1: Create a Connection​

Password Authentication​

Key Pair Authentication​

Required Snowflake Permissions​

Step 2: Test the Connection​

Step 3: Configure an Embedding Provider​

Available Models​

Step 4: Create a Vector Table​

Step 5: Trigger Backfill​

Step 6: Query​

Filter Operators​

Step 7: Monitor Sync​

Sync Statuses​

Related​