Sync & Backfill

Overview

After creating a vector table, you need to populate it with vectors (backfill) and keep it up to date as source data changes (sync). This page explains how data flows from your source table into your vector table.

Backfill

Backfill is the initial load. When you trigger a backfill (POST /v1/vector-tables/{id}/backfill), Embedd:

Reads all rows from the source table
Generates embeddings for each row using the configured model
Writes vectors + metadata to the vector store (Qdrant or platform table)
Tracks progress as a durable task (visible via GET /tasks)

Backfill is resumable — if interrupted, it picks up from the last checkpoint rather than starting over.

In managed mode, backfill respects the max_vectors tier limit. If your table has more rows than your budget allows, backfill processes up to the limit and logs a warning.

Sync

After backfill completes, sync keeps vectors current with ongoing synchronization.

How sync detects changes

Each source row gets a hash of its embedded + metadata columns
On each sync cycle, Embedd compares source hashes to stored hashes
New hashes → insert, changed hashes → update, missing hashes → delete

Sync modes

Batch: Full-table comparison. Scans the entire source table each cycle. Simple and reliable, best for smaller tables or when latency isn't critical.
CDC: Polling-based change data capture. Queries for changes since the last sync. Lower latency and reduced compute for large tables.

Sync statuses

Status	Meaning
`pending`	Vector table created, no backfill run yet
`backfilling`	Initial backfill in progress
`synced`	Up to date and actively syncing
`paused`	Sync manually paused
`error`	Last sync failed (check `last_error`)
`pending_rebackfill`	Config changed, needs re-backfill with atomic swap

Staleness

The staleness_secs field in sync status tells you how long since the last successful sync cycle. A low staleness means your vectors closely reflect the source data. High staleness may indicate sync issues or a paused state.

Vector Budget (Managed Mode Only)

In managed mode, your organization's subscription tier includes a max_vectors limit. This budget is shared across all managed-mode vector tables in the org.

During backfill: rows are processed up to the budget; excess rows are skipped with a warning
During sync: new inserts are skipped when at budget; updates and deletes always proceed (they don't increase the count)
Platform mode is unrestricted — vectors live in your database, not Embedd's infrastructure

See Subscription Tiers for limits per tier.

Re-Backfill (Atomic Swap)

When you update a vector table's columns, embedding_model, or embedding_dimensions, all vectors need to be regenerated. Embedd handles this with zero downtime.

Managed mode (Qdrant)

Creates a new Qdrant collection
Backfills the new collection with updated embeddings
Deletes the old collection
Updates the database reference to point to the new collection

Platform mode

Creates a swap table in your database
Backfills the swap table with updated embeddings
Atomically renames: live → _old, swap → live
Drops the _old table

Queries continue to hit the live table throughout — no downtime, no stale results during the swap.

Controlling Sync

Pause: POST /v1/vector-tables/{id}/sync/pause — queries still work against existing vectors
Resume: POST /v1/vector-tables/{id}/sync/resume — picks up where it left off
Status: GET /v1/vector-tables/{id}/sync/status — check staleness, pending rows, errors

See the Sync API Reference for full endpoint docs.

Overview​

Backfill​

Sync​

How sync detects changes​

Sync modes​

Sync statuses​

Staleness​

Vector Budget (Managed Mode Only)​

Re-Backfill (Atomic Swap)​

Managed mode (Qdrant)​

Platform mode​

Controlling Sync​