Skip to main content

Managed + Snowflake

Embedd reads data from your Snowflake database, generates embeddings via your chosen provider, and stores vectors in Embedd's managed Qdrant instance. This guide walks through the full setup end to end.

What you need:

  • A Snowflake account with a table you want to make searchable
  • An API key from an embedding provider (OpenAI, Gemini, or Voyage)
  • An Embedd API key

Step 1: Create a Connection

Register your Snowflake database as a source. Embedd supports two authentication methods.

Password Authentication

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "analytics-warehouse",
"mode": "managed",
"credentials": {
"auth_method": "password",
"account": "myorg-account",
"user": "EMBEDD_USER",
"password": "secure_password",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC",
"role": "EMBEDD_ROLE"
}
}'

Key Pair Authentication

curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "analytics-warehouse",
"mode": "managed",
"credentials": {
"auth_method": "key_pair",
"account": "myorg-account",
"user": "EMBEDD_USER",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEv...\n-----END PRIVATE KEY-----",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC",
"role": "EMBEDD_ROLE"
}
}'

Required Snowflake Permissions

The role you provide must have SELECT access on the source table. Run these grants in Snowflake as an admin:

CREATE ROLE IF NOT EXISTS EMBEDD_ROLE;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;
GRANT SELECT ON TABLE ANALYTICS.PUBLIC.PRODUCTS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE EMBEDD_ROLE;
GRANT ROLE EMBEDD_ROLE TO USER EMBEDD_USER;

Save the id from the response (e.g., conn_abc123) -- you will need it in Step 4.


Step 2: Test the Connection

Verify that Embedd can reach your Snowflake account with the stored credentials.

curl -X POST https://api.embedd.to/v1/connections/conn_abc123/test \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"

A successful response looks like:

{
"status": "connected",
"details": {"latency_ms": 245}
}

If the test fails, check that your account identifier, credentials, and role grants are correct.


Step 3: Configure an Embedding Provider

Register the API key for the embedding service you want to use.

curl -X POST https://api.embedd.to/v1/embedding-providers \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "openai-prod",
"provider": "openai",
"api_key": "sk-proj-...",
"default_model": "text-embedding-3-small"
}'

Save the id from the response (e.g., ep_abc123).

Available Models

ProviderModelDimensions
openaitext-embedding-3-small1536
openaitext-embedding-3-large3072
openaitext-embedding-ada-0021536
geminitext-embedding-004768
geminigemini-embedding-0013072
geminigemini-embedding-exp-033072
voyagevoyage-31024
voyagevoyage-3-lite512
voyagevoyage-code-31024
voyagevoyage-finance-21024
voyagevoyage-law-21024

Step 4: Create a Vector Table

Define which source table to embed, which columns to include, and how to map them.

Snowflake column names are uppercase by default. Use the exact casing that matches your source table.

curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "product-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "ep_abc123",
"source_table": "PRODUCTS",
"primary_key_column": "ID",
"columns": [
{"name": "NAME", "role": "embedding", "ordinal": 1, "name_prefix": "name: "},
{"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
{"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
{"name": "PRICE", "role": "metadata", "filter_type": "float"},
{"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
],
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
"mode": "managed"
}'

Column roles:

  • embedding -- text columns whose content is concatenated and embedded. Use ordinal to control the order and name_prefix to add context (e.g., "name: " prepended to the value).
  • metadata -- columns stored alongside vectors for filtering at query time. Each metadata column requires a filter_type: keyword, integer, float, or boolean.

Save the id from the response (e.g., vt_abc123).


Step 5: Trigger Backfill

Start the initial backfill to read all rows from your Snowflake table, generate embeddings, and store vectors in Qdrant.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/backfill \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
{
"vector_table_id": "vt_abc123",
"status": "backfilling",
"total_rows": 10000,
"synced_rows": 0
}

The backfill runs asynchronously. For large tables this may take several minutes. You can check progress with the sync status endpoint (Step 7).


Step 6: Query

Once the backfill completes, run a semantic search against your vectors.

curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"query": "affordable wireless headphones",
"limit": 5,
"filters": {
"CATEGORY": {"eq": "electronics"},
"PRICE": {"lte": 50},
"IN_STOCK": {"eq": true}
}
}'
{
"results": [
{
"id": 42,
"similarity_score": 0.91,
"metadata": {
"CATEGORY": "electronics",
"PRICE": 29.99,
"IN_STOCK": true
},
"embedded_text": "name: Wireless Bluetooth Headphones Lightweight over-ear headphones with 30-hour battery life..."
}
],
"query": "affordable wireless headphones",
"model": "text-embedding-3-small",
"total_results": 1
}

Filter Operators

Filters use operators without a $ prefix:

OperatorDescription
eqEquals
neNot equals
gtGreater than
gteGreater than or equal
ltLess than
lteLess than or equal
inIn array
ninNot in array
existsField exists

Multiple filters are combined with AND logic.


Step 7: Monitor Sync

After the initial backfill, Embedd continuously syncs changes from your Snowflake table. Check the current sync status at any time.

curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
{
"vector_table_id": "vt_abc123",
"sync_status": "synced",
"last_sync_at": "2025-01-15T10:30:00Z",
"rows_pending": 0,
"staleness_secs": 120,
"last_error": null,
"last_error_at": null
}

Sync Statuses

StatusMeaning
pendingCreated but no backfill run yet
backfillingInitial backfill in progress
syncedUp to date and actively syncing
pausedSync manually paused
errorLast sync failed (check last_error)
pending_rebackfillConfiguration changed, needs re-backfill

You can pause and resume sync as needed:

# Pause
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/pause \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"

# Resume
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/resume \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"