Skip to main content

Search Tables

What Is a Search Table?

A search table is a vectorized mirror of your source data. You pick which columns to embed (for semantic search) and which to keep as filterable metadata. Embedd reads your source table, generates embeddings, and keeps them in sync.

Your source table stays untouched. Embedd creates a parallel representation where each row becomes a vector — a numerical fingerprint of the text content — along with any metadata you want to filter on. When your source data changes, the search table updates automatically.

Why Not Just Use SQL?

SQL is the right tool for exact lookups, but it falls short when the query is intent-based, fuzzy, or conceptual.

ScenarioSQLSearch Table
"Find product with SKU TR-PRO-001"WHERE sku = 'TR-PRO-001' — instant, exactUnnecessary — use SQL
"Find footwear under $100"WHERE category = 'footwear' AND price < 100 — works perfectlyUnnecessary — use SQL
"Something good for a rainy day hike"WHERE description LIKE '%rainy%' — misses "waterproof", "weather-resistant", "all-conditions"Returns waterproof jackets, treated boots, rain covers — understands intent
"Comfortable shoes for long distances"WHERE description LIKE '%comfortable%' — misses "cushioned", "ergonomic", "supportive"Returns trail runners with cushioning, hiking boots with arch support — gets the concept
"Gift for someone who loves the outdoors"No reasonable SQL query existsReturns curated results across categories based on the concept of outdoor enthusiasm

Search tables shine when the query expresses what the user means, not what they literally typed. If your queries are always exact and structured, stick with SQL. If your users ask questions in natural language, search tables close the gap.

Designing Your Search Table

The key decision is what should be searchable vs filterable.

Embedding columns

Embedding columns get concatenated into a single text block and vectorized. These are the columns that semantic search will match against. Choose columns that contain natural language — descriptions, titles, reviews, notes.

Metadata columns

Metadata columns are stored alongside the vector for filtering. They are not embedded, so they do not affect search relevance. Choose structured attributes — categories, prices, dates, booleans, status fields.

Rules of thumb

  • Embed natural language — descriptions, titles, reviews, free-text fields
  • Filter structured data — categories, prices, dates, booleans, enums
  • Omit internal fields — auto-increment IDs, foreign keys, internal timestamps, SKUs
  • Fewer embedding columns is often better — combining too many columns dilutes the semantic signal. A product description + name is usually enough. Adding the SKU or a category string to the embedded text adds noise.
  • Use name_prefix — when combining multiple embedding columns, use name_prefix to label each one. This helps the embedding model understand the structure of the text (e.g., "Product: Trail Runner Pro Description: Lightweight trail shoe...").

Example: choosing columns

Products table

ColumnDecisionReasoning
nameEmbedShort natural language, highly relevant to search intent
descriptionEmbedRich natural language, primary search target
categoryMetadataStructured — better as a filter than embedded text
brandMetadataStructured — users filter by brand, not search semantically
priceMetadataNumeric — use for range filters
in_stockMetadataBoolean — use for availability filters
skuOmitInternal identifier, no search or filter value
created_atOmitInternal timestamp, not useful for product search

Support tickets table

ColumnDecisionReasoning
subjectEmbedShort summary of the issue in natural language
bodyEmbedDetailed description — primary search content
statusMetadataEnum (open, closed, pending) — filter, not search
priorityMetadataEnum (low, medium, high, critical) — filter
assigned_toMetadataStructured — filter by assignee
created_atMetadataUseful for "recent tickets" filters
internal_notesOmitSensitive internal content, should not be in search results
ticket_idOmitInternal identifier

Full Example: From Relational Table to Search Table

Let's walk through the entire journey with a products table.

Source table

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(100),
brand VARCHAR(100),
price NUMERIC(10, 2),
in_stock BOOLEAN DEFAULT true,
sku VARCHAR(50),
created_at TIMESTAMPTZ DEFAULT NOW()
);

Sample data

idnamedescriptioncategorybrandpricein_stock
1Trail Runner ProLightweight trail shoe with responsive cushioning and aggressive grip for technical terrainfootwearRidgeline129.99true
2Summit Backpack 45LDurable 45-liter backpack with ventilated back panel and multiple compartment organizationgearRidgeline189.99true
3Alpine Shell JacketWaterproof 3-layer shell with sealed seams and adjustable hood for alpine conditionsclothingPeakwear249.99false
4Merino Base LayerThermoregulating merino wool base layer that stays warm when wet and resists odor on multi-day tripsclothingPeakwear79.99true
5Trekking Poles CarbonUltralight carbon fiber trekking poles with cork grips and quick-lock adjustmentgearRidgeline149.99true

Step 1: Plan your columns

ColumnDecisionReasoning
idPrimary keyUsed internally to track rows — required
nameEmbedProduct name is natural language, directly relevant to search
descriptionEmbedRich text that captures what the product is and does
categoryMetadataStructured field — "footwear", "gear", "clothing" — useful for filtering
brandMetadataStructured field — useful for filtering
priceMetadataNumeric — useful for range filters like "under $150"
in_stockMetadataBoolean — useful for filtering out unavailable products
skuOmitInternal reference, no search or filter value
created_atOmitInternal timestamp, not relevant for product search

Step 2: Create a connection

curl -X POST https://api.embedd.to/v1/providers/postgresql/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "product-db",
"mode": "managed",
"credentials": {
"host": "your-db-host.com",
"port": 5432,
"database": "myapp",
"user": "embedd_reader",
"password": "..."
}
}'

Response:

{
"id": "conn_abc123",
"name": "product-db",
"provider": "postgresql",
"mode": "managed",
"status": "active",
"created_at": "2026-03-13T10:00:00Z"
}

Step 3: Configure an embedding provider

curl -X POST https://api.embedd.to/v1/embedding-providers \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "openai",
"provider": "openai",
"api_key": "sk-your-openai-key",
"default_model": "text-embedding-3-small"
}'

Response:

{
"id": "emb_xyz789",
"name": "openai",
"provider": "openai",
"default_model": "text-embedding-3-small",
"created_at": "2026-03-13T10:01:00Z"
}

Step 4: Create the vector table

curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "product-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "products",
"primary_key": "id",
"columns": [
{
"source_column": "name",
"type": "embedding",
"name_prefix": "Product: "
},
{
"source_column": "description",
"type": "embedding",
"name_prefix": "Description: "
},
{
"source_column": "category",
"type": "metadata"
},
{
"source_column": "brand",
"type": "metadata"
},
{
"source_column": "price",
"type": "metadata"
},
{
"source_column": "in_stock",
"type": "metadata"
}
]
}'

Response:

{
"id": "vtbl_prod_001",
"name": "product-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "products",
"sync_status": "pending_backfill",
"synced_rows": 0,
"created_at": "2026-03-13T10:02:00Z"
}

For each row, Embedd concatenates the embedding columns with their prefixes into a single text block before generating the vector. For the first row, the embedded text looks like:

Product: Trail Runner Pro Description: Lightweight trail shoe with responsive cushioning and aggressive grip for technical terrain

The name_prefix labels give the embedding model context about the structure of the text, which improves search relevance.

Step 5: Trigger backfill

curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/backfill \
-H "Authorization: Bearer sk_your_api_key"

Response:

{
"task_id": "task_bf_001",
"status": "pending",
"vector_table_id": "vtbl_prod_001"
}

Check the sync status:

curl https://api.embedd.to/v1/vector-tables/vtbl_prod_001 \
-H "Authorization: Bearer sk_your_api_key"
{
"id": "vtbl_prod_001",
"name": "product-search",
"sync_status": "synced",
"synced_rows": 5,
"last_synced_at": "2026-03-13T10:03:15Z"
}

All 5 rows are now vectorized and queryable. From here, Embedd automatically detects changes to the source table and re-embeds updated rows.

Step 6: Query

Search for products that match an intent, combined with metadata filters:

curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "something warm for a long hike",
"top_k": 5,
"filters": {
"and": [
{ "field": "in_stock", "op": "eq", "value": true },
{ "field": "price", "op": "lte", "value": 150 }
]
}
}'

Response:

{
"results": [
{
"id": 4,
"score": 0.91,
"metadata": {
"category": "clothing",
"brand": "Peakwear",
"price": 79.99,
"in_stock": true
}
},
{
"id": 1,
"score": 0.72,
"metadata": {
"category": "footwear",
"brand": "Ridgeline",
"price": 129.99,
"in_stock": true
}
},
{
"id": 5,
"score": 0.68,
"metadata": {
"category": "gear",
"brand": "Ridgeline",
"price": 149.99,
"in_stock": true
}
}
]
}

The Merino Base Layer (score: 0.91) is the top result. Notice that the word "warm" does not appear anywhere in its description — the description says "thermoregulating merino wool" and "stays warm when wet." The embedding model understands that merino wool, thermoregulation, and multi-day trips are semantically close to "something warm for a long hike." That is semantic search.

The filters did their job too:

  • Alpine Shell Jacket was excluded because in_stock is false
  • Summit Backpack 45L was excluded because its price ($189.99) exceeds the lte: 150 filter

This combination of semantic matching and structured filtering is the core pattern for search tables: let vectors handle meaning, let filters handle constraints.

Real-World Use Case: AI Shopping Assistant

A customer messages your AI shopping assistant:

"I need something for trail running under $150"

The assistant calls the search table API:

curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "trail running shoes",
"top_k": 3,
"filters": {
"and": [
{ "field": "in_stock", "op": "eq", "value": true },
{ "field": "price", "op": "lte", "value": 150 }
]
}
}'

Response:

{
"results": [
{
"id": 1,
"score": 0.95,
"metadata": {
"category": "footwear",
"brand": "Ridgeline",
"price": 129.99,
"in_stock": true
}
},
{
"id": 5,
"score": 0.61,
"metadata": {
"category": "gear",
"brand": "Ridgeline",
"price": 149.99,
"in_stock": true
}
}
]
}

The assistant now has product context and can respond naturally:

"I'd recommend the Trail Runner Pro by Ridgeline at $129.99. It's a lightweight trail shoe with responsive cushioning and aggressive grip — built for technical terrain. It's in stock and within your budget. Would you like to add it to your cart?"

Without the search table, the assistant would either need hardcoded product knowledge (which goes stale) or a complex SQL query that can only match exact keywords. The search table bridges the gap between what the customer asks and what you sell.


Next steps