Sync & Backfill
Overview
After creating a vector table, you need to populate it with vectors (backfill) and keep it up to date as source data changes (sync). This page explains how data flows from your source table into your vector table.
Backfill
Backfill is the initial load. When you trigger a backfill (POST /v1/vector-tables/{id}/backfill), Embedd:
- Reads all rows from the source table
- Generates embeddings for each row using the configured model
- Writes vectors + metadata to the vector store (Qdrant or platform table)
- Tracks progress as a durable task (visible via
GET /tasks)
Backfill is resumable — if interrupted, it picks up from the last checkpoint rather than starting over.
In managed mode, backfill respects the max_vectors tier limit. If your table has more rows than your budget allows, backfill processes up to the limit and logs a warning.
Sync
After backfill completes, sync keeps vectors current with ongoing synchronization.
How sync detects changes
- Each source row gets a hash of its embedded + metadata columns
- On each sync cycle, Embedd compares source hashes to stored hashes
- New hashes → insert, changed hashes → update, missing hashes → delete
Sync modes
- Batch: Full-table comparison. Scans the entire source table each cycle. Simple and reliable, best for smaller tables or when latency isn't critical.
- CDC: Polling-based change data capture. Queries for changes since the last sync. Lower latency and reduced compute for large tables.
Sync statuses
| Status | Meaning |
|---|---|
pending | Vector table created, no backfill run yet |
backfilling | Initial backfill in progress |
synced | Up to date and actively syncing |
paused | Sync manually paused |
error | Last sync failed (check last_error) |
pending_rebackfill | Config changed, needs re-backfill with atomic swap |
Staleness
The staleness_secs field in sync status tells you how long since the last successful sync cycle. A low staleness means your vectors closely reflect the source data. High staleness may indicate sync issues or a paused state.
Vector Budget (Managed Mode Only)
In managed mode, your organization's subscription tier includes a max_vectors limit. This budget is shared across all managed-mode vector tables in the org.
- During backfill: rows are processed up to the budget; excess rows are skipped with a warning
- During sync: new inserts are skipped when at budget; updates and deletes always proceed (they don't increase the count)
- Platform mode is unrestricted — vectors live in your database, not Embedd's infrastructure
See Subscription Tiers for limits per tier.
Re-Backfill (Atomic Swap)
When you update a vector table's columns, embedding_model, or embedding_dimensions, all vectors need to be regenerated. Embedd handles this with zero downtime.
Managed mode (Qdrant)
- Creates a new Qdrant collection
- Backfills the new collection with updated embeddings
- Deletes the old collection
- Updates the database reference to point to the new collection
Platform mode
- Creates a swap table in your database
- Backfills the swap table with updated embeddings
- Atomically renames: live → _old, swap → live
- Drops the _old table
Queries continue to hit the live table throughout — no downtime, no stale results during the swap.
Controlling Sync
- Pause:
POST /v1/vector-tables/{id}/sync/pause— queries still work against existing vectors - Resume:
POST /v1/vector-tables/{id}/sync/resume— picks up where it left off - Status:
GET /v1/vector-tables/{id}/sync/status— check staleness, pending rows, errors
See the Sync API Reference for full endpoint docs.