# Embedd.to — Full Documentation

> Provider-agnostic vectorized table management API

Embedd.to connects to your existing databases (Snowflake, PostgreSQL), automatically generates and maintains vector embeddings, and provides a unified query interface for semantic search.

---

## Key Concepts

- **Environments** — Logical isolation for resources (dev, staging, production)
- **Connections** — Encrypted credentials to source databases (Snowflake, PostgreSQL)
- **Embedding Providers** — API credentials for embedding services (OpenAI, Google Gemini)
- **Vector Tables** — Link a source table to its vector representation
- **Managed Mode** — Embedd.to stores vectors in Qdrant
- **Platform Mode** — Vectors stored in your database (Snowflake VECTOR type or pgvector)
- **Batch Sync** — Periodic full-table comparison using row hashes
- **CDC Sync** — Polling-based change data capture for near-real-time sync

## Authentication

All requests require TWO headers:

1. API key (required on every request):
```
Authorization: Bearer sk_your_api_key
```

2. Environment ID (required on all endpoints except `POST /v1/environments` and `GET /v1/environments`):
```
X-Environment-Id: env_abc123
```

**IMPORTANT:** You must include `X-Environment-Id` on every request to connections, embedding providers, vector tables, query, and sync endpoints. Omitting it will return an error.

Rate limit: 60 requests per minute per API key (default).

---

## API Reference

### Environments

#### Create Environment
```
POST /v1/environments
Content-Type: application/json

{"name": "production"}
```
Response: `{"id": "env_abc123", "org_id": "org_xyz", "name": "production", "created_at": "..."}`

#### List Environments
```
GET /v1/environments?limit=20&cursor=...
```

#### Get Environment
```
GET /v1/environments/{environment_id}
```

#### Delete Environment
```
DELETE /v1/environments/{environment_id}
```
Returns 204. Fails with 409 if resources are attached.

---

### Connections

#### Create Connection
```
POST /v1/providers/{provider}/connections
Content-Type: application/json

{
  "name": "my-snowflake",
  "mode": "managed",
  "credentials": {
    "auth_method": "password",
    "account": "myorg-account",
    "user": "EMBEDD_USER",
    "password": "secure_password",
    "warehouse": "COMPUTE_WH",
    "database": "ANALYTICS",
    "schema": "PUBLIC",
    "role": "EMBEDD_ROLE"
  }
}
```

Supported providers: `snowflake`, `postgresql`

Snowflake auth methods: `password`, `key_pair`
PostgreSQL credentials: `host`, `port`, `database`, `user`, `password`, `ssl_mode`

#### List Connections
```
GET /v1/connections?limit=20&cursor=...
```

#### Get Connection
```
GET /v1/connections/{connection_id}
```

#### Update Connection
```
PUT /v1/connections/{connection_id}
Content-Type: application/json

{"name": "updated-name", "credentials": {...}}
```

#### Delete Connection
```
DELETE /v1/connections/{connection_id}
```
Returns 204. Fails with 409 if vector tables are attached.

#### Test Connection
```
POST /v1/connections/{connection_id}/test
```
Response: `{"status": "connected", "details": {"latency_ms": 245}}`

---

### Embedding Providers

Supported providers: `openai`, `gemini`

OpenAI models: `text-embedding-3-small` (1536d), `text-embedding-3-large` (3072d), `text-embedding-ada-002` (1536d)
Gemini models: `text-embedding-004` (768d)

#### Create Embedding Provider
```
POST /v1/embedding-providers
Content-Type: application/json

{
  "name": "openai-prod",
  "provider": "openai",
  "api_key": "sk-proj-...",
  "default_model": "text-embedding-3-small"
}
```

#### List Embedding Providers
```
GET /v1/embedding-providers?limit=20&cursor=...
```

#### Get / Update / Delete
```
GET /v1/embedding-providers/{provider_id}
PUT /v1/embedding-providers/{provider_id}
DELETE /v1/embedding-providers/{provider_id}
```

#### List Available Models
```
GET /v1/embedding-providers/{provider_id}/models
```
Response: `{"models": [{"model": "text-embedding-3-small", "dimensions": 1536, "max_tokens": 8191, "provider": "openai"}]}`

---

### Vector Tables

#### Create Vector Table
```
POST /v1/vector-tables
Content-Type: application/json

{
  "name": "product-search",
  "connection_id": "conn_abc123",
  "embedding_provider_id": "ep_abc123",
  "source_table": "PRODUCTS",
  "primary_key_column": "ID",
  "columns": [
    {"name": "DESCRIPTION", "role": "embedding", "ordinal": 1},
    {"name": "CATEGORY", "role": "metadata", "filter_type": "keyword"},
    {"name": "PRICE", "role": "metadata", "filter_type": "float"}
  ],
  "embedding_model": "text-embedding-3-small",
  "embedding_dimensions": 1536,
  "mode": "managed",
  "sync_mode": "cdc"
}
```

Column roles: `embedding` (text to embed), `metadata` (filterable fields)
Filter types for metadata: `keyword`, `integer`, `float`, `boolean`
Modes: `managed` (Qdrant), `platform` (customer DB)
Sync modes: `batch`, `cdc`

`embedding_provider_id` is required for managed mode and PostgreSQL platform mode.
Snowflake platform mode can use Cortex (no embedding provider needed).

#### List / Get / Update / Delete
```
GET /v1/vector-tables
GET /v1/vector-tables/{id}
PUT /v1/vector-tables/{id}
DELETE /v1/vector-tables/{id}
```

Updating `columns`, `embedding_model`, or `embedding_dimensions` triggers `pending_rebackfill` status.
Next backfill performs atomic swap (zero-downtime re-embedding).

#### Trigger Backfill
```
POST /v1/vector-tables/{id}/backfill
```
Async operation. Returns 409 if already backfilling.

#### Get Source Schema
```
GET /v1/vector-tables/{id}/schema
```

---

### Query (Semantic Search)

```
POST /v1/vector-tables/{id}/query
Content-Type: application/json

{
  "query": "comfortable running shoes",
  "limit": 5,
  "min_similarity_score": 0.7,
  "filters": {
    "category": {"$eq": "footwear"},
    "price": {"$lte": 150}
  }
}
```

Response:
```json
{
  "results": [
    {
      "id": 42,
      "similarity_score": 0.92,
      "metadata": {"category": "footwear", "price": 129.99},
      "embedded_text": "Lightweight running shoe..."
    }
  ],
  "query": "comfortable running shoes",
  "model": "text-embedding-3-small",
  "total_results": 1
}
```

Filter operators: `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, `$nin`, `$exists`
Multiple filters are ANDed together.

---

### Sync Control

#### Get Sync Status
```
GET /v1/vector-tables/{id}/sync/status
```
Response includes: `sync_status`, `last_sync_at`, `rows_pending`, `staleness_secs`, `last_error`, `last_error_at`

Statuses: `pending`, `backfilling`, `synced`, `paused`, `error`, `pending_rebackfill`

#### Pause Sync
```
POST /v1/vector-tables/{id}/sync/pause
```

#### Resume Sync
```
POST /v1/vector-tables/{id}/sync/resume
```

---

## Error Format

```json
{
  "error": {
    "code": "error_code",
    "message": "Human-readable description",
    "resolution": "Suggested fix",
    "docs_url": "https://embedd.to/errors/error_code",
    "request_id": "req_abc123"
  }
}
```

Error codes: `unauthorized` (401), `rate_limit_exceeded` (429), `*_not_found` (404), `validation_error` (422), `connection_not_ready` (422), `vector_table_not_ready` (422), `*_conflict` (409), `backfill_in_progress` (409), `source_table_access_denied` (403)

---

## Rate Limiting

Default: 60 requests per minute per API key.
Headers: `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Reset`
429 responses include `Retry-After` header.
The `/health` endpoint is exempt.

---

## Provider-Specific Details

### Snowflake
- Auth: password or key_pair
- Platform mode: VECTOR(FLOAT, N) type, VARIANT metadata
- Native embeddings via Cortex: snowflake-arctic-embed-m-v1.5 (768d), snowflake-arctic-embed-l-v2.0 (1024d)
- Filter translation: metadata:field syntax

### PostgreSQL
- Auth: host/port/user/password with optional ssl_mode
- Platform mode: pgvector vector(N) type, JSONB metadata, HNSW index
- Requires external embedding provider (no native embeddings)
- Filter translation: metadata->>'field' JSONB syntax
- hnsw.iterative_scan = on for consistent filtered queries