Managed + Snowflake
Embedd reads data from your Snowflake database, generates embeddings via your chosen provider, and stores vectors in Embedd's managed Qdrant instance. This guide walks through the full setup end to end.
What you need:
- A Snowflake account with a table you want to make searchable
- An API key from an embedding provider (OpenAI, Gemini, or Voyage)
- An Embedd API key
Step 1: Create a Connection
Register your Snowflake database as a source. Embedd supports two authentication methods.
Password Authentication
curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "analytics-warehouse",
"mode": "managed",
"credentials": {
"auth_method": "password",
"account": "myorg-account",
"user": "EMBEDD_USER",
"password": "secure_password",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC",
"role": "EMBEDD_ROLE"
}
}'
Key Pair Authentication
curl -X POST https://api.embedd.to/v1/providers/snowflake/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "analytics-warehouse",
"mode": "managed",
"credentials": {
"auth_method": "key_pair",
"account": "myorg-account",
"user": "EMBEDD_USER",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEv...\n-----END PRIVATE KEY-----",
"warehouse": "COMPUTE_WH",
"database": "ANALYTICS",
"schema": "PUBLIC",
"role": "EMBEDD_ROLE"
}
}'
Required Snowflake Permissions
The role you provide must have SELECT access on the source table. Run these grants in Snowflake as an admin:
CREATE ROLE IF NOT EXISTS EMBEDD_ROLE;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE EMBEDD_ROLE;
GRANT SELECT ON TABLE ANALYTICS.PUBLIC.PRODUCTS TO ROLE EMBEDD_ROLE;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE EMBEDD_ROLE;
GRANT ROLE EMBEDD_ROLE TO USER EMBEDD_USER;
Save the id from the response (e.g., conn_abc123) -- you will need it in Step 4.
Step 2: Test the Connection
Verify that Embedd can reach your Snowflake account with the stored credentials.
curl -X POST https://api.embedd.to/v1/connections/conn_abc123/test \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
A successful response looks like:
{
"status": "connected",
"details": {"latency_ms": 245}
}
If the test fails, check that your account identifier, credentials, and role grants are correct.
Step 3: Configure an Embedding Provider
Register the API key for the embedding service you want to use.
curl -X POST https://api.embedd.to/v1/embedding-providers \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "openai-prod",
"provider": "openai",
"api_key": "sk-proj-...",
"default_model": "text-embedding-3-small"
}'
Save the id from the response (e.g., ep_abc123).
Available Models
| Provider | Model | Dimensions |
|---|---|---|
openai | text-embedding-3-small | 1536 |
openai | text-embedding-3-large | 3072 |
openai | text-embedding-ada-002 | 1536 |
gemini | text-embedding-004 | 768 |
gemini | gemini-embedding-001 | 3072 |
gemini | gemini-embedding-exp-03 | 3072 |
voyage | voyage-3 | 1024 |
voyage | voyage-3-lite | 512 |
voyage | voyage-code-3 | 1024 |
voyage | voyage-finance-2 | 1024 |
voyage | voyage-law-2 | 1024 |
Step 4: Create a Vector Table
Define which source table to embed, which columns to include, and how to map them.
Snowflake column names are uppercase by default. Use the exact casing that matches your source table.
curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"name": "product-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "ep_abc123",
"source_table": "PRODUCTS",
"primary_key_column": "ID",
"columns": [
{"name": "NAME", "role": "embedding", "ordinal": 1, "name_prefix": "name: "},
{"name": "DESCRIPTION", "role": "embedding", "ordinal": 2},
{"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
{"name": "PRICE", "role": "metadata", "filter_type": "float"},
{"name": "IN_STOCK", "role": "metadata", "filter_type": "boolean"}
],
"embedding_model": "text-embedding-3-small",
"embedding_dimensions": 1536,
"mode": "managed"
}'
Column roles:
embedding-- text columns whose content is concatenated and embedded. Useordinalto control the order andname_prefixto add context (e.g.,"name: "prepended to the value).metadata-- columns stored alongside vectors for filtering at query time. Each metadata column requires afilter_type:keyword,integer,float, orboolean.
Save the id from the response (e.g., vt_abc123).
Step 5: Trigger Backfill
Start the initial backfill to read all rows from your Snowflake table, generate embeddings, and store vectors in Qdrant.
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/backfill \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
{
"vector_table_id": "vt_abc123",
"status": "backfilling",
"total_rows": 10000,
"synced_rows": 0
}
The backfill runs asynchronously. For large tables this may take several minutes. You can check progress with the sync status endpoint (Step 7).
Step 6: Query
Once the backfill completes, run a semantic search against your vectors.
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123" \
-H "Content-Type: application/json" \
-d '{
"query": "affordable wireless headphones",
"limit": 5,
"filters": {
"CATEGORY": {"eq": "electronics"},
"PRICE": {"lte": 50},
"IN_STOCK": {"eq": true}
}
}'
{
"results": [
{
"id": 42,
"similarity_score": 0.91,
"metadata": {
"CATEGORY": "electronics",
"PRICE": 29.99,
"IN_STOCK": true
},
"embedded_text": "name: Wireless Bluetooth Headphones Lightweight over-ear headphones with 30-hour battery life..."
}
],
"query": "affordable wireless headphones",
"model": "text-embedding-3-small",
"total_results": 1
}
Filter Operators
Filters use operators without a $ prefix:
| Operator | Description |
|---|---|
eq | Equals |
ne | Not equals |
gt | Greater than |
gte | Greater than or equal |
lt | Less than |
lte | Less than or equal |
in | In array |
nin | Not in array |
exists | Field exists |
Multiple filters are combined with AND logic.
Step 7: Monitor Sync
After the initial backfill, Embedd continuously syncs changes from your Snowflake table. Check the current sync status at any time.
curl https://api.embedd.to/v1/vector-tables/vt_abc123/sync/status \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
{
"vector_table_id": "vt_abc123",
"sync_status": "synced",
"last_sync_at": "2025-01-15T10:30:00Z",
"rows_pending": 0,
"staleness_secs": 120,
"last_error": null,
"last_error_at": null
}
Sync Statuses
| Status | Meaning |
|---|---|
pending | Created but no backfill run yet |
backfilling | Initial backfill in progress |
synced | Up to date and actively syncing |
paused | Sync manually paused |
error | Last sync failed (check last_error) |
pending_rebackfill | Configuration changed, needs re-backfill |
You can pause and resume sync as needed:
# Pause
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/pause \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
# Resume
curl -X POST https://api.embedd.to/v1/vector-tables/vt_abc123/sync/resume \
-H "Authorization: Bearer sk_your_api_key" \
-H "X-Environment-Id: env_abc123"
Related
- Connections API Reference -- full connection options and update/delete operations
- Embedding Providers API Reference -- manage providers and list models
- Vector Tables API Reference -- column configuration, schema introspection, re-backfill
- Query API Reference -- advanced filtering and similarity thresholds
- Sync API Reference -- pause, resume, and status details