Skip to main content

Sync & Backfill

Overview

After creating a vector table, you need to populate it with vectors (backfill) and keep it up to date as source data changes (sync). This page explains how data flows from your source table into your vector table.

Backfill

Backfill is the initial load. When you trigger a backfill (POST /v1/vector-tables/{id}/backfill), Embedd:

  • Reads all rows from the source table
  • Generates embeddings for each row using the configured model
  • Writes vectors + metadata to the vector store (Qdrant or platform table)
  • Tracks progress as a durable task (visible via GET /tasks)

Backfill is resumable — if interrupted, it picks up from the last checkpoint rather than starting over.

In managed mode, backfill respects the max_vectors tier limit. If your table has more rows than your budget allows, backfill processes up to the limit and logs a warning.

Sync

After backfill completes, sync keeps vectors current with ongoing synchronization.

How sync detects changes

  • Each source row gets a hash of its embedded + metadata columns
  • On each sync cycle, Embedd compares source hashes to stored hashes
  • New hashes → insert, changed hashes → update, missing hashes → delete

Sync modes

  • Batch: Full-table comparison. Scans the entire source table each cycle. Simple and reliable, best for smaller tables or when latency isn't critical.
  • CDC: Polling-based change data capture. Queries for changes since the last sync. Lower latency and reduced compute for large tables.

Sync statuses

StatusMeaning
pendingVector table created, no backfill run yet
backfillingInitial backfill in progress
syncedUp to date and actively syncing
pausedSync manually paused
errorLast sync failed (check last_error)
pending_rebackfillConfig changed, needs re-backfill with atomic swap

Staleness

The staleness_secs field in sync status tells you how long since the last successful sync cycle. A low staleness means your vectors closely reflect the source data. High staleness may indicate sync issues or a paused state.

Vector Budget (Managed Mode Only)

In managed mode, your organization's subscription tier includes a max_vectors limit. This budget is shared across all managed-mode vector tables in the org.

  • During backfill: rows are processed up to the budget; excess rows are skipped with a warning
  • During sync: new inserts are skipped when at budget; updates and deletes always proceed (they don't increase the count)
  • Platform mode is unrestricted — vectors live in your database, not Embedd's infrastructure

See Subscription Tiers for limits per tier.

Re-Backfill (Atomic Swap)

When you update a vector table's columns, embedding_model, or embedding_dimensions, all vectors need to be regenerated. Embedd handles this with zero downtime.

Managed mode (Qdrant)

  1. Creates a new Qdrant collection
  2. Backfills the new collection with updated embeddings
  3. Deletes the old collection
  4. Updates the database reference to point to the new collection

Platform mode

  1. Creates a swap table in your database
  2. Backfills the swap table with updated embeddings
  3. Atomically renames: live → _old, swap → live
  4. Drops the _old table

Queries continue to hit the live table throughout — no downtime, no stale results during the swap.

Controlling Sync

  • Pause: POST /v1/vector-tables/{id}/sync/pause — queries still work against existing vectors
  • Resume: POST /v1/vector-tables/{id}/sync/resume — picks up where it left off
  • Status: GET /v1/vector-tables/{id}/sync/status — check staleness, pending rows, errors

See the Sync API Reference for full endpoint docs.