Tools
pgvector with PostgreSQL
Add vector search capabilities to PostgreSQL using the pgvector extension.
pgvector
pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. If you're already using PostgreSQL, this is often the simplest path to adding vector search.
Key Features
- Store vectors as a native PostgreSQL type
- Index vectors with HNSW or IVFFlat for fast search
- Three similarity operators: L2 distance (
<->), cosine distance (<=>), inner product (<#>) - Works with all PostgreSQL features: transactions, joins, filters
When to Use pgvector
- You already use PostgreSQL
- Your vector data is alongside relational data
- You need transactional consistency
- Your scale is millions (not billions) of vectors
Setup
sql
CREATE EXTENSION IF NOT EXISTS vector;Example
sql
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table with a vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536), -- OpenAI text-embedding-3-small dimension
source TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create HNSW index for fast similarity search
-- (better query performance, slower indexing)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Or IVFFlat index (faster to build, slightly slower queries)
-- CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);
-- Insert a document with embedding (pass as array)
INSERT INTO documents (content, embedding, source)
VALUES (
'Vector databases are optimized for similarity search',
'[0.1, 0.2, 0.3, ...]'::vector, -- actual 1536-dim vector
'documentation'
);
-- Semantic search: find most similar documents
-- <-> = L2 distance, <=> = cosine distance, <#> = negative inner product
SELECT
id,
content,
source,
1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
-- Filter + semantic search (hybrid)
SELECT
id,
content,
1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
WHERE source = 'documentation' -- metadata filter
AND created_at > '2024-01-01'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;Try it yourself — SQL