Search Tables

What Is a Search Table?

A search table is a vectorized mirror of your source data. You pick which columns to embed (for semantic search) and which to keep as filterable metadata. Embedd reads your source table, generates embeddings, and keeps them in sync.

Your source table stays untouched. Embedd creates a parallel representation where each row becomes a vector — a numerical fingerprint of the text content — along with any metadata you want to filter on. When your source data changes, the search table updates automatically.

Why Not Just Use SQL?

SQL is the right tool for exact lookups, but it falls short when the query is intent-based, fuzzy, or conceptual.

Scenario	SQL	Search Table
"Find product with SKU `TR-PRO-001`"	`WHERE sku = 'TR-PRO-001'` — instant, exact	Unnecessary — use SQL
"Find footwear under $100"	`WHERE category = 'footwear' AND price < 100` — works perfectly	Unnecessary — use SQL
"Something good for a rainy day hike"	`WHERE description LIKE '%rainy%'` — misses "waterproof", "weather-resistant", "all-conditions"	Returns waterproof jackets, treated boots, rain covers — understands intent
"Comfortable shoes for long distances"	`WHERE description LIKE '%comfortable%'` — misses "cushioned", "ergonomic", "supportive"	Returns trail runners with cushioning, hiking boots with arch support — gets the concept
"Gift for someone who loves the outdoors"	No reasonable SQL query exists	Returns curated results across categories based on the concept of outdoor enthusiasm

Search tables shine when the query expresses what the user means, not what they literally typed. If your queries are always exact and structured, stick with SQL. If your users ask questions in natural language, search tables close the gap.

Designing Your Search Table

The key decision is what should be searchable vs filterable.

Embedding columns

Embedding columns get concatenated into a single text block and vectorized. These are the columns that semantic search will match against. Choose columns that contain natural language — descriptions, titles, reviews, notes.

Metadata columns

Metadata columns are stored alongside the vector for filtering. They are not embedded, so they do not affect search relevance. Choose structured attributes — categories, prices, dates, booleans, status fields.

Rules of thumb

Embed natural language — descriptions, titles, reviews, free-text fields
Filter structured data — categories, prices, dates, booleans, enums
Omit internal fields — auto-increment IDs, foreign keys, internal timestamps, SKUs
Fewer embedding columns is often better — combining too many columns dilutes the semantic signal. A product description + name is usually enough. Adding the SKU or a category string to the embedded text adds noise.
Use name_prefix — when combining multiple embedding columns, use name_prefix to label each one. This helps the embedding model understand the structure of the text (e.g., "Product: Trail Runner Pro Description: Lightweight trail shoe...").

Example: choosing columns

Products table

Column	Decision	Reasoning
`name`	Embed	Short natural language, highly relevant to search intent
`description`	Embed	Rich natural language, primary search target
`category`	Metadata	Structured — better as a filter than embedded text
`brand`	Metadata	Structured — users filter by brand, not search semantically
`price`	Metadata	Numeric — use for range filters
`in_stock`	Metadata	Boolean — use for availability filters
`sku`	Omit	Internal identifier, no search or filter value
`created_at`	Omit	Internal timestamp, not useful for product search

Support tickets table

Column	Decision	Reasoning
`subject`	Embed	Short summary of the issue in natural language
`body`	Embed	Detailed description — primary search content
`status`	Metadata	Enum (`open`, `closed`, `pending`) — filter, not search
`priority`	Metadata	Enum (`low`, `medium`, `high`, `critical`) — filter
`assigned_to`	Metadata	Structured — filter by assignee
`created_at`	Metadata	Useful for "recent tickets" filters
`internal_notes`	Omit	Sensitive internal content, should not be in search results
`ticket_id`	Omit	Internal identifier

Full Example: From Relational Table to Search Table

Let's walk through the entire journey with a products table.

Source table

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    category VARCHAR(100),
    brand VARCHAR(100),
    price NUMERIC(10, 2),
    in_stock BOOLEAN DEFAULT true,
    sku VARCHAR(50),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Sample data

id	name	description	category	brand	price	in_stock
1	Trail Runner Pro	Lightweight trail shoe with responsive cushioning and aggressive grip for technical terrain	footwear	Ridgeline	129.99	true
2	Summit Backpack 45L	Durable 45-liter backpack with ventilated back panel and multiple compartment organization	gear	Ridgeline	189.99	true
3	Alpine Shell Jacket	Waterproof 3-layer shell with sealed seams and adjustable hood for alpine conditions	clothing	Peakwear	249.99	false
4	Merino Base Layer	Thermoregulating merino wool base layer that stays warm when wet and resists odor on multi-day trips	clothing	Peakwear	79.99	true
5	Trekking Poles Carbon	Ultralight carbon fiber trekking poles with cork grips and quick-lock adjustment	gear	Ridgeline	149.99	true

Step 1: Plan your columns

Column	Decision	Reasoning
`id`	Primary key	Used internally to track rows — required
`name`	Embed	Product name is natural language, directly relevant to search
`description`	Embed	Rich text that captures what the product is and does
`category`	Metadata	Structured field — "footwear", "gear", "clothing" — useful for filtering
`brand`	Metadata	Structured field — useful for filtering
`price`	Metadata	Numeric — useful for range filters like "under $150"
`in_stock`	Metadata	Boolean — useful for filtering out unavailable products
`sku`	Omit	Internal reference, no search or filter value
`created_at`	Omit	Internal timestamp, not relevant for product search

Step 2: Create a connection

curl -X POST https://api.embedd.to/v1/providers/postgresql/connections \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "product-db",
    "mode": "managed",
    "credentials": {
      "host": "your-db-host.com",
      "port": 5432,
      "database": "myapp",
      "user": "embedd_reader",
      "password": "..."
    }
  }'

Response:

{
  "id": "conn_abc123",
  "name": "product-db",
  "provider": "postgresql",
  "mode": "managed",
  "status": "active",
  "created_at": "2026-03-13T10:00:00Z"
}

Step 3: Configure an embedding provider

curl -X POST https://api.embedd.to/v1/embedding-providers \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai",
    "provider": "openai",
    "api_key": "sk-your-openai-key",
    "default_model": "text-embedding-3-small"
  }'

Response:

{
  "id": "emb_xyz789",
  "name": "openai",
  "provider": "openai",
  "default_model": "text-embedding-3-small",
  "created_at": "2026-03-13T10:01:00Z"
}

Step 4: Create the vector table

curl -X POST https://api.embedd.to/v1/vector-tables \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "product-search",
    "connection_id": "conn_abc123",
    "embedding_provider_id": "emb_xyz789",
    "source_table": "products",
    "primary_key": "id",
    "columns": [
      {
        "source_column": "name",
        "type": "embedding",
        "name_prefix": "Product: "
      },
      {
        "source_column": "description",
        "type": "embedding",
        "name_prefix": "Description: "
      },
      {
        "source_column": "category",
        "type": "metadata"
      },
      {
        "source_column": "brand",
        "type": "metadata"
      },
      {
        "source_column": "price",
        "type": "metadata"
      },
      {
        "source_column": "in_stock",
        "type": "metadata"
      }
    ]
  }'

Response:

{
  "id": "vtbl_prod_001",
  "name": "product-search",
  "connection_id": "conn_abc123",
  "embedding_provider_id": "emb_xyz789",
  "source_table": "products",
  "sync_status": "pending_backfill",
  "synced_rows": 0,
  "created_at": "2026-03-13T10:02:00Z"
}

For each row, Embedd concatenates the embedding columns with their prefixes into a single text block before generating the vector. For the first row, the embedded text looks like:

Product: Trail Runner Pro Description: Lightweight trail shoe with responsive cushioning and aggressive grip for technical terrain

The name_prefix labels give the embedding model context about the structure of the text, which improves search relevance.

Step 5: Trigger backfill

curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/backfill \
  -H "Authorization: Bearer sk_your_api_key"

Response:

{
  "task_id": "task_bf_001",
  "status": "pending",
  "vector_table_id": "vtbl_prod_001"
}

Check the sync status:

curl https://api.embedd.to/v1/vector-tables/vtbl_prod_001 \
  -H "Authorization: Bearer sk_your_api_key"

{
  "id": "vtbl_prod_001",
  "name": "product-search",
  "sync_status": "synced",
  "synced_rows": 5,
  "last_synced_at": "2026-03-13T10:03:15Z"
}

All 5 rows are now vectorized and queryable. From here, Embedd automatically detects changes to the source table and re-embeds updated rows.

Step 6: Query

Search for products that match an intent, combined with metadata filters:

curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/query \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "something warm for a long hike",
    "top_k": 5,
    "filters": {
      "and": [
        { "field": "in_stock", "op": "eq", "value": true },
        { "field": "price", "op": "lte", "value": 150 }
      ]
    }
  }'

Response:

{
  "results": [
    {
      "id": 4,
      "score": 0.91,
      "metadata": {
        "category": "clothing",
        "brand": "Peakwear",
        "price": 79.99,
        "in_stock": true
      }
    },
    {
      "id": 1,
      "score": 0.72,
      "metadata": {
        "category": "footwear",
        "brand": "Ridgeline",
        "price": 129.99,
        "in_stock": true
      }
    },
    {
      "id": 5,
      "score": 0.68,
      "metadata": {
        "category": "gear",
        "brand": "Ridgeline",
        "price": 149.99,
        "in_stock": true
      }
    }
  ]
}

The Merino Base Layer (score: 0.91) is the top result. Notice that the word "warm" does not appear anywhere in its description — the description says "thermoregulating merino wool" and "stays warm when wet." The embedding model understands that merino wool, thermoregulation, and multi-day trips are semantically close to "something warm for a long hike." That is semantic search.

The filters did their job too:

Alpine Shell Jacket was excluded because in_stock is false
Summit Backpack 45L was excluded because its price ($189.99) exceeds the lte: 150 filter

This combination of semantic matching and structured filtering is the core pattern for search tables: let vectors handle meaning, let filters handle constraints.

Real-World Use Case: AI Shopping Assistant

A customer messages your AI shopping assistant:

"I need something for trail running under $150"

The assistant calls the search table API:

curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/query \
  -H "Authorization: Bearer sk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "trail running shoes",
    "top_k": 3,
    "filters": {
      "and": [
        { "field": "in_stock", "op": "eq", "value": true },
        { "field": "price", "op": "lte", "value": 150 }
      ]
    }
  }'

Response:

{
  "results": [
    {
      "id": 1,
      "score": 0.95,
      "metadata": {
        "category": "footwear",
        "brand": "Ridgeline",
        "price": 129.99,
        "in_stock": true
      }
    },
    {
      "id": 5,
      "score": 0.61,
      "metadata": {
        "category": "gear",
        "brand": "Ridgeline",
        "price": 149.99,
        "in_stock": true
      }
    }
  ]
}

The assistant now has product context and can respond naturally:

"I'd recommend the Trail Runner Pro by Ridgeline at $129.99. It's a lightweight trail shoe with responsive cushioning and aggressive grip — built for technical terrain. It's in stock and within your budget. Would you like to add it to your cart?"

Without the search table, the assistant would either need hardcoded product knowledge (which goes stale) or a complex SQL query that can only match exact keywords. The search table bridges the gap between what the customer asks and what you sell.

Next steps

Filters — Full reference for filter operators and syntax
Sync & Backfill — How Embedd keeps your search table up to date as source data changes
Set up your first search table with one of the guided walkthroughs:

What Is a Search Table?​

Why Not Just Use SQL?​

Designing Your Search Table​

Embedding columns​

Metadata columns​

Rules of thumb​

Example: choosing columns​

Full Example: From Relational Table to Search Table​

Source table​

Sample data​

Step 1: Plan your columns​

Step 2: Create a connection​

Step 3: Configure an embedding provider​

Step 4: Create the vector table​

Step 5: Trigger backfill​

Step 6: Query​

Real-World Use Case: AI Shopping Assistant​

Next steps​

What Is a Search Table?

Why Not Just Use SQL?

Designing Your Search Table

Embedding columns

Metadata columns

Rules of thumb

Example: choosing columns

Full Example: From Relational Table to Search Table

Source table

Sample data

Step 1: Plan your columns

Step 2: Create a connection

Step 3: Configure an embedding provider

Step 4: Create the vector table

Step 5: Trigger backfill

Step 6: Query

Real-World Use Case: AI Shopping Assistant

Next steps