Vector Search for AI Applications
This guide uses the VSS (Vector Similarity Search) core extension and faiss extension by the DuckDB community. Thanks to the contributors who made vector search accessible in DuckDB!
Build production-ready semantic search and RAG (Retrieval Augmented Generation) APIs using DuckDB's vector extensions. Perfect for AI agents, documentation search, recommendation engines, and knowledge bases.
Why Vector Search with flAPI?
"RAG as a service, deployed in minutes."
Modern AI applications need semantic search:
- 🤖 RAG for LLMs - Give AI context from your data
- 🔍 Semantic search - Find by meaning, not keywords
- 📚 Knowledge bases - Intelligent document retrieval
- 💡 Recommendations - Similar products, content, users
- 🎯 Classification - Categorize by similarity
flAPI makes it production-ready:
- ✅ Sub-10ms queries with vector indexes
- ✅ SQL-native vector operations (no new query language)
- ✅ REST API + MCP for AI agents
- ✅ Hybrid search (combine vectors + keywords + filters)
- ✅ Multiple backends (Faiss, VSS, built-in)
Vector Search Options
DuckDB provides two powerful vector extensions:
Option 1: VSS (Vector Similarity Search)
Best for: General purpose, simpler setup
init: |
INSTALL vss;
LOAD vss;
- Native DuckDB extension
- HNSW index support
- Good for < 1M vectors
- Simpler API
Option 2: Faiss
Best for: Large scale, high performance
init: |
INSTALL faiss FROM community;
LOAD faiss;
- Facebook's battle-tested library
- Multiple index types (IVF, HNSW, Flat)
- Optimized for 10M+ vectors
- Advanced quantization
Real-World Use Cases
🤖 RAG for Customer Support
Scenario: AI chatbot needs to find relevant help articles
User: "How do I reset my password?"
→ Vector search finds top 3 relevant articles
→ LLM generates answer using context
📚 Documentation Search
Scenario: Semantic code search across your codebase
Query: "authentication middleware"
→ Finds auth-related functions by meaning
→ Not just keyword matching
🛍️ Product Recommendations
Scenario: "Find similar products"
User views: "Wireless Keyboard"
→ Vector search finds similar products
→ Based on features, not just category
🎓 Academic Research
Scenario: Find related research papers
Input: Paper abstract
→ Vector search finds similar papers
→ By research topic/methodology, not just keywords
Architecture
Installation & Setup
Install VSS Extension
connections:
vector-db:
init: |
-- Install vector similarity search
INSTALL vss;
LOAD vss;
-- Create embeddings table
CREATE TABLE IF NOT EXISTS document_embeddings (
id VARCHAR PRIMARY KEY,
content TEXT,
embedding FLOAT[1536], -- OpenAI ada-002 dimension
metadata JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create HNSW index for fast search
CREATE INDEX IF NOT EXISTS embedding_idx ON document_embeddings
USING HNSW (embedding);
Or Install Faiss
connections:
vector-db-faiss:
init: |
INSTALL faiss FROM community;
LOAD faiss;
CREATE TABLE IF NOT EXISTS document_embeddings (
id VARCHAR PRIMARY KEY,
content TEXT,
embedding FLOAT[],
metadata JSON
);
Example 1: Knowledge Base Search
Step 1: Generate and Store Embeddings
# generate_embeddings.py
import openai
import duckdb
# Connect to DuckDB
conn = duckdb.connect('vector_db.duckdb')
# Your documents
documents = [
{"id": "doc1", "content": "How to reset your password: Go to settings..."},
{"id": "doc2", "content": "API authentication uses JWT tokens..."},
{"id": "doc3", "content": "Database backup runs daily at 2am..."}
]
# Generate embeddings
for doc in documents:
embedding = openai.Embedding.create(
input=doc['content'],
model="text-embedding-ada-002"
)['data'][0]['embedding']
# Insert into DuckDB
conn.execute("""
INSERT INTO document_embeddings (id, content, embedding, metadata)
VALUES (?, ?, ?, ?)
""", [
doc['id'],
doc['content'],
embedding,
{'category': 'docs'}
])
conn.close()
Step 2: Create Search Endpoint
WITH query_embedding AS (
-- In production, you'd pass the pre-computed embedding
SELECT CAST({{{params.embedding}}} AS FLOAT[1536]) as emb
)
SELECT
d.id,
d.content,
d.metadata,
-- Calculate cosine similarity
array_cosine_similarity(d.embedding, q.emb) as similarity
FROM document_embeddings d, query_embedding q
WHERE 1=1
{{#params.category}}
AND d.metadata->>'category' = '{{{params.category}}}'
{{/params.category}}
{{#params.min_similarity}}
AND array_cosine_similarity(d.embedding, q.emb) >= {{{params.min_similarity}}}
{{/params.min_similarity}}
ORDER BY similarity DESC
LIMIT {{#params.limit}}{{{params.limit}}}{{/params.limit}}{{^params.limit}}10{{/params.limit}}
url-path: /search/semantic/
template-source: semantic_search.sql
connection:
- vector-db
description: Semantic search using vector embeddings
parameters:
embedding:
type: string
description: Query embedding as JSON array
required: true
category:
type: string
description: Filter by metadata category
min_similarity:
type: number
description: Minimum similarity threshold (0-1)
minimum: 0
maximum: 1
limit:
type: integer
description: Maximum results
minimum: 1
maximum: 100
Step 3: Search from Client
# client.py
import openai
import requests
# 1. Generate embedding for query
query = "How do I authenticate with the API?"
query_embedding = openai.Embedding.create(
input=query,
model="text-embedding-ada-002"
)['data'][0]['embedding']
# 2. Search via flAPI
response = requests.post(
'http://localhost:8080/search/semantic/',
json={
'embedding': query_embedding,
'limit': 5,
'min_similarity': 0.7
}
)
results = response.json()['data']
for doc in results:
print(f"{doc['similarity']:.2f}: {doc['content'][:100]}...")
Example 2: Hybrid Search (Vectors + Keywords)
Combine semantic search with traditional filters:
WITH query_embedding AS (
SELECT CAST({{{params.embedding}}} AS FLOAT[1536]) as emb
)
SELECT
d.id,
d.content,
d.metadata,
array_cosine_similarity(d.embedding, q.emb) as vector_score,
-- Keyword matching score
CASE
WHEN LOWER(d.content) LIKE LOWER('%{{{params.keywords}}}%') THEN 0.2
ELSE 0
END as keyword_score,
-- Combined score
(array_cosine_similarity(d.embedding, q.emb) + keyword_score) as final_score
FROM document_embeddings d, query_embedding q
WHERE 1=1
-- Metadata filters
{{#params.doc_type}}
AND d.metadata->>'type' = '{{{params.doc_type}}}'
{{/params.doc_type}}
{{#params.date_after}}
AND d.created_at >= '{{{params.date_after}}}'
{{/params.date_after}}
-- Vector similarity threshold
AND array_cosine_similarity(d.embedding, q.emb) >= 0.5
ORDER BY final_score DESC
LIMIT 20
Benefits:
- ✅ Semantic understanding (vectors)
- ✅ Exact phrase matching (keywords)
- ✅ Metadata filtering (structured data)
- ✅ Date ranges, categories, tags
Example 3: RAG for AI Agents (MCP)
Expose vector search as MCP tools for Claude/GPT:
url-path: /rag/search/
template-source: semantic_search.sql
connection:
- vector-db
# MCP tool description
mcp-tool-description: |
search_knowledge_base: Search the company knowledge base semantically.
Use this to find relevant documentation, policies, or information
to answer user questions.
Input parameters:
- query: The user's question or search query (string, required)
- category: Filter by document category (string, optional)
- limit: Max results to return (integer, default 5)
Returns: Array of relevant documents with content and similarity scores.
AI Agent Integration:
# mcp_server.py
from mcp import MCPServer
import openai
import requests
mcp = MCPServer()
@mcp.tool("search_knowledge_base")
async def search_kb(query: str, category: str = None, limit: int = 5):
# Generate embedding
embedding = openai.Embedding.create(
input=query,
model="text-embedding-ada-002"
)['data'][0]['embedding']
# Search via flAPI
response = requests.post(
'http://localhost:8080/rag/search/',
json={
'embedding': embedding,
'category': category,
'limit': limit
}
)
return response.json()['data']
Performance Optimization
Index Selection
HNSW (Hierarchical Navigable Small World):
-- Best for: < 10M vectors, balanced speed/accuracy
CREATE INDEX embedding_idx ON embeddings USING HNSW (embedding)
WITH (m = 16, ef_construction = 200);
IVF (Inverted File Index) with Faiss:
-- Best for: 10M+ vectors, extreme scale
CREATE INDEX embedding_idx ON embeddings USING IVF (embedding)
WITH (nlist = 100);
Query Optimization
-- ❌ Slow: Full table scan
SELECT * FROM embeddings
ORDER BY array_cosine_similarity(embedding, query_emb) DESC
LIMIT 10;
-- ✅ Fast: Use index with threshold
SELECT * FROM embeddings
WHERE array_cosine_similarity(embedding, query_emb) >= 0.7
ORDER BY array_cosine_similarity(embedding, query_emb) DESC
LIMIT 10;
Caching Strategies
# Cache frequently accessed embeddings
cache-template: popular_docs_cache.sql
cache-schedule: "0 */6 * * *"
-- Pre-compute embeddings for popular documents
SELECT *
FROM document_embeddings
WHERE id IN (
SELECT document_id
FROM search_analytics
GROUP BY document_id
HAVING COUNT(*) > 100
)
Embedding Models Comparison
Model | Dimension | Cost | Speed | Quality |
---|---|---|---|---|
OpenAI ada-002 | 1536 | $0.0001/1K | Fast | Excellent |
OpenAI text-3-small | 1536 | $0.00002/1K | Fast | Good |
OpenAI text-3-large | 3072 | $0.00013/1K | Medium | Best |
Cohere embed-v3 | 1024 | $0.0001/1K | Fast | Excellent |
Sentence Transformers | 384-768 | Free | Fast | Good |
Recommendation: Start with OpenAI text-3-small (cheapest, good quality)
Distance Metrics
Cosine Similarity (Most common)
array_cosine_similarity(embedding1, embedding2)
-- Returns: 0 to 1 (higher = more similar)
Use when: Embeddings are normalized (like OpenAI)
Euclidean Distance
array_distance(embedding1, embedding2)
-- Returns: 0 to infinity (lower = more similar)
Use when: Magnitude matters
Dot Product
array_inner_product(embedding1, embedding2)
-- Returns: -infinity to infinity (higher = more similar)
Use when: Faster than cosine, embeddings normalized
Security & Best Practices
1. Secure Embedding Generation
# ❌ Don't expose raw embedding API
@app.post("/generate-embedding")
def generate_embedding(text: str):
return openai.Embedding.create(input=text)
# ✅ Do: Generate server-side, never trust client embeddings
@app.post("/search")
def search(query: str):
embedding = openai.Embedding.create(input=query)
return flapi.search(embedding)
2. Rate Limiting
# Prevent abuse of expensive vector queries
rate-limit:
requests-per-minute: 60
requests-per-hour: 1000
3. Input Validation
parameters:
embedding:
type: string
required: true
# Validate dimension
pattern: '^\[(-?\d+\.?\d*,\s*){1535}-?\d+\.?\d*\]$'
4. Content Filtering
-- Prevent unauthorized access
WHERE d.metadata->>'visibility' = 'public'
OR d.metadata->>'owner_id' = '{{{auth.user_id}}}'
Troubleshooting
Issue: Slow vector searches
Problem: Queries take > 100ms
Solutions:
- Create HNSW index (if missing)
- Add similarity threshold (
WHERE similarity >= 0.7
) - Reduce vector dimension (use smaller model)
- Use Faiss for large datasets (> 1M vectors)
Issue: Poor search quality
Problem: Irrelevant results returned
Solutions:
- Increase
min_similarity
threshold - Use hybrid search (combine with keywords)
- Try better embedding model (ada-002 → text-3-large)
- Add metadata filters (category, date, etc.)
- Re-generate embeddings with domain-specific model
Issue: High embedding costs
Problem: OpenAI embedding costs too high
Solutions:
- Use cheaper model (
text-3-small
instead oftext-3-large
) - Cache embeddings (don't regenerate)
- Batch embedding generation (up to 2048 inputs)
- Consider open-source models (Sentence Transformers)
Issue: Index creation fails
Error: Index too large
-- Solution: Increase memory or use IVF
SET memory_limit='8GB';
CREATE INDEX embedding_idx ON embeddings USING IVF (embedding)
WITH (nlist = 100); -- Partition for large datasets
Scaling Considerations
Up to 100K vectors
# Use VSS with HNSW
INSTALL vss;
CREATE INDEX USING HNSW;
100K - 10M vectors
# Use Faiss with IVF
INSTALL faiss FROM community;
CREATE INDEX USING IVF WITH (nlist = 1000);
10M+ vectors
# Use Faiss with quantization
CREATE INDEX USING IVF WITH (
nlist = 4096,
quantizer = 'SQ8' -- 8-bit quantization
);
Next Steps
- Vector Search RAG Example: Complete RAG implementation
- AI Integration: MCP tools for AI agents
- Claude Integration: Use with Claude Desktop
- SQL Templating: Dynamic vector queries
- Caching: Cache popular embeddings
- Google Sheets: Combine with spreadsheet data
Additional Resources
- DuckDB VSS Extension: Official VSS docs
- DuckDB Faiss Extension: Faiss extension docs
- OpenAI Embeddings: Embedding generation guide
- Faiss Wiki: Faiss documentation
For production RAG systems, use hybrid search (vectors + keywords + metadata filters). Pure vector search can miss exact matches, while pure keyword search misses semantic meaning. Combining both gives the best results.