Search Tables
What Is a Search Table?
A search table is a vectorized mirror of your source data. You pick which columns to embed (for semantic search) and which to keep as filterable metadata. Embedd reads your source table, generates embeddings, and keeps them in sync.
Your source table stays untouched. Embedd creates a parallel representation where each row becomes a vector — a numerical fingerprint of the text content — along with any metadata you want to filter on. When your source data changes, the search table updates automatically.
Why Not Just Use SQL?
SQL is the right tool for exact lookups, but it falls short when the query is intent-based, fuzzy, or conceptual.
| Scenario | SQL | Search Table |
|---|---|---|
"Find product with SKU TR-PRO-001" | WHERE sku = 'TR-PRO-001' — instant, exact | Unnecessary — use SQL |
| "Find footwear under $100" | WHERE category = 'footwear' AND price < 100 — works perfectly | Unnecessary — use SQL |
| "Something good for a rainy day hike" | WHERE description LIKE '%rainy%' — misses "waterproof", "weather-resistant", "all-conditions" | Returns waterproof jackets, treated boots, rain covers — understands intent |
| "Comfortable shoes for long distances" | WHERE description LIKE '%comfortable%' — misses "cushioned", "ergonomic", "supportive" | Returns trail runners with cushioning, hiking boots with arch support — gets the concept |
| "Gift for someone who loves the outdoors" | No reasonable SQL query exists | Returns curated results across categories based on the concept of outdoor enthusiasm |
Search tables shine when the query expresses what the user means, not what they literally typed. If your queries are always exact and structured, stick with SQL. If your users ask questions in natural language, search tables close the gap.
Designing Your Search Table
The key decision is what should be searchable vs filterable.
Embedding columns
Embedding columns get concatenated into a single text block and vectorized. These are the columns that semantic search will match against. Choose columns that contain natural language — descriptions, titles, reviews, notes.
Metadata columns
Metadata columns are stored alongside the vector for filtering. They are not embedded, so they do not affect search relevance. Choose structured attributes — categories, prices, dates, booleans, status fields.
Rules of thumb
- Embed natural language — descriptions, titles, reviews, free-text fields
- Filter structured data — categories, prices, dates, booleans, enums
- Omit internal fields — auto-increment IDs, foreign keys, internal timestamps, SKUs
- Fewer embedding columns is often better — combining too many columns dilutes the semantic signal. A product description + name is usually enough. Adding the SKU or a category string to the embedded text adds noise.
- Use
name_prefix— when combining multiple embedding columns, usename_prefixto label each one. This helps the embedding model understand the structure of the text (e.g.,"Product: Trail Runner Pro Description: Lightweight trail shoe...").
Example: choosing columns
Products table
| Column | Decision | Reasoning |
|---|---|---|
name | Embed | Short natural language, highly relevant to search intent |
description | Embed | Rich natural language, primary search target |
category | Metadata | Structured — better as a filter than embedded text |
brand | Metadata | Structured — users filter by brand, not search semantically |
price | Metadata | Numeric — use for range filters |
in_stock | Metadata | Boolean — use for availability filters |
sku | Omit | Internal identifier, no search or filter value |
created_at | Omit | Internal timestamp, not useful for product search |
Support tickets table
| Column | Decision | Reasoning |
|---|---|---|
subject | Embed | Short summary of the issue in natural language |
body | Embed | Detailed description — primary search content |
status | Metadata | Enum (open, closed, pending) — filter, not search |
priority | Metadata | Enum (low, medium, high, critical) — filter |
assigned_to | Metadata | Structured — filter by assignee |
created_at | Metadata | Useful for "recent tickets" filters |
internal_notes | Omit | Sensitive internal content, should not be in search results |
ticket_id | Omit | Internal identifier |
Full Example: From Relational Table to Search Table
Let's walk through the entire journey with a products table.
Source table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
category VARCHAR(100),
brand VARCHAR(100),
price NUMERIC(10, 2),
in_stock BOOLEAN DEFAULT true,
sku VARCHAR(50),
created_at TIMESTAMPTZ DEFAULT NOW()
);
Sample data
| id | name | description | category | brand | price | in_stock |
|---|---|---|---|---|---|---|
| 1 | Trail Runner Pro | Lightweight trail shoe with responsive cushioning and aggressive grip for technical terrain | footwear | Ridgeline | 129.99 | true |
| 2 | Summit Backpack 45L | Durable 45-liter backpack with ventilated back panel and multiple compartment organization | gear | Ridgeline | 189.99 | true |
| 3 | Alpine Shell Jacket | Waterproof 3-layer shell with sealed seams and adjustable hood for alpine conditions | clothing | Peakwear | 249.99 | false |
| 4 | Merino Base Layer | Thermoregulating merino wool base layer that stays warm when wet and resists odor on multi-day trips | clothing | Peakwear | 79.99 | true |
| 5 | Trekking Poles Carbon | Ultralight carbon fiber trekking poles with cork grips and quick-lock adjustment | gear | Ridgeline | 149.99 | true |
Step 1: Plan your columns
| Column | Decision | Reasoning |
|---|---|---|
id | Primary key | Used internally to track rows — required |
name | Embed | Product name is natural language, directly relevant to search |
description | Embed | Rich text that captures what the product is and does |
category | Metadata | Structured field — "footwear", "gear", "clothing" — useful for filtering |
brand | Metadata | Structured field — useful for filtering |
price | Metadata | Numeric — useful for range filters like "under $150" |
in_stock | Metadata | Boolean — useful for filtering out unavailable products |
sku | Omit | Internal reference, no search or filter value |
created_at | Omit | Internal timestamp, not relevant for product search |
Step 2: Create a connection
curl -X POST https://api.embedd.to/v1/providers/postgresql/connections \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "product-db",
"mode": "managed",
"credentials": {
"host": "your-db-host.com",
"port": 5432,
"database": "myapp",
"user": "embedd_reader",
"password": "..."
}
}'
Response:
{
"id": "conn_abc123",
"name": "product-db",
"provider": "postgresql",
"mode": "managed",
"status": "active",
"created_at": "2026-03-13T10:00:00Z"
}
Step 3: Configure an embedding provider
curl -X POST https://api.embedd.to/v1/embedding-providers \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "openai",
"provider": "openai",
"api_key": "sk-your-openai-key",
"default_model": "text-embedding-3-small"
}'
Response:
{
"id": "emb_xyz789",
"name": "openai",
"provider": "openai",
"default_model": "text-embedding-3-small",
"created_at": "2026-03-13T10:01:00Z"
}
Step 4: Create the vector table
curl -X POST https://api.embedd.to/v1/vector-tables \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "product-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "products",
"primary_key": "id",
"columns": [
{
"source_column": "name",
"type": "embedding",
"name_prefix": "Product: "
},
{
"source_column": "description",
"type": "embedding",
"name_prefix": "Description: "
},
{
"source_column": "category",
"type": "metadata"
},
{
"source_column": "brand",
"type": "metadata"
},
{
"source_column": "price",
"type": "metadata"
},
{
"source_column": "in_stock",
"type": "metadata"
}
]
}'
Response:
{
"id": "vtbl_prod_001",
"name": "product-search",
"connection_id": "conn_abc123",
"embedding_provider_id": "emb_xyz789",
"source_table": "products",
"sync_status": "pending_backfill",
"synced_rows": 0,
"created_at": "2026-03-13T10:02:00Z"
}
For each row, Embedd concatenates the embedding columns with their prefixes into a single text block before generating the vector. For the first row, the embedded text looks like:
Product: Trail Runner Pro Description: Lightweight trail shoe with responsive cushioning and aggressive grip for technical terrain
The name_prefix labels give the embedding model context about the structure of the text, which improves search relevance.
Step 5: Trigger backfill
curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/backfill \
-H "Authorization: Bearer sk_your_api_key"
Response:
{
"task_id": "task_bf_001",
"status": "pending",
"vector_table_id": "vtbl_prod_001"
}
Check the sync status:
curl https://api.embedd.to/v1/vector-tables/vtbl_prod_001 \
-H "Authorization: Bearer sk_your_api_key"
{
"id": "vtbl_prod_001",
"name": "product-search",
"sync_status": "synced",
"synced_rows": 5,
"last_synced_at": "2026-03-13T10:03:15Z"
}
All 5 rows are now vectorized and queryable. From here, Embedd automatically detects changes to the source table and re-embeds updated rows.
Step 6: Query
Search for products that match an intent, combined with metadata filters:
curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "something warm for a long hike",
"top_k": 5,
"filters": {
"and": [
{ "field": "in_stock", "op": "eq", "value": true },
{ "field": "price", "op": "lte", "value": 150 }
]
}
}'
Response:
{
"results": [
{
"id": 4,
"score": 0.91,
"metadata": {
"category": "clothing",
"brand": "Peakwear",
"price": 79.99,
"in_stock": true
}
},
{
"id": 1,
"score": 0.72,
"metadata": {
"category": "footwear",
"brand": "Ridgeline",
"price": 129.99,
"in_stock": true
}
},
{
"id": 5,
"score": 0.68,
"metadata": {
"category": "gear",
"brand": "Ridgeline",
"price": 149.99,
"in_stock": true
}
}
]
}
The Merino Base Layer (score: 0.91) is the top result. Notice that the word "warm" does not appear anywhere in its description — the description says "thermoregulating merino wool" and "stays warm when wet." The embedding model understands that merino wool, thermoregulation, and multi-day trips are semantically close to "something warm for a long hike." That is semantic search.
The filters did their job too:
- Alpine Shell Jacket was excluded because
in_stockisfalse - Summit Backpack 45L was excluded because its price ($189.99) exceeds the
lte: 150filter
This combination of semantic matching and structured filtering is the core pattern for search tables: let vectors handle meaning, let filters handle constraints.
Real-World Use Case: AI Shopping Assistant
A customer messages your AI shopping assistant:
"I need something for trail running under $150"
The assistant calls the search table API:
curl -X POST https://api.embedd.to/v1/vector-tables/vtbl_prod_001/query \
-H "Authorization: Bearer sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"query": "trail running shoes",
"top_k": 3,
"filters": {
"and": [
{ "field": "in_stock", "op": "eq", "value": true },
{ "field": "price", "op": "lte", "value": 150 }
]
}
}'
Response:
{
"results": [
{
"id": 1,
"score": 0.95,
"metadata": {
"category": "footwear",
"brand": "Ridgeline",
"price": 129.99,
"in_stock": true
}
},
{
"id": 5,
"score": 0.61,
"metadata": {
"category": "gear",
"brand": "Ridgeline",
"price": 149.99,
"in_stock": true
}
}
]
}
The assistant now has product context and can respond naturally:
"I'd recommend the Trail Runner Pro by Ridgeline at $129.99. It's a lightweight trail shoe with responsive cushioning and aggressive grip — built for technical terrain. It's in stock and within your budget. Would you like to add it to your cart?"
Without the search table, the assistant would either need hardcoded product knowledge (which goes stale) or a complex SQL query that can only match exact keywords. The search table bridges the gap between what the customer asks and what you sell.
Next steps
- Filters — Full reference for filter operators and syntax
- Sync & Backfill — How Embedd keeps your search table up to date as source data changes
- Set up your first search table with one of the guided walkthroughs: