41.7 pgvector: Storing and Querying Vector Embeddings for AI Workloads

Right, so you’ve decided to join the AI party, and your model is spitting out these magical lists of numbers called “embeddings.” They’re the secret sauce, turning words, images, and songs into a geometry your database can actually reason about. But where do you put them? Not in a TEXT column, for heaven’s sake. You stuff them into PostgreSQL using the pgvector extension, which is basically giving your rock-solid relational database a new superpower: understanding proximity and similarity. It’s the least-boring way to bridge the gap between your AI models and your structured data.

What Are We Even Doing Here? Vectors 101

An embedding is just a list of floating-point numbers—a vector. The model’s genius is in its placement; semantically similar things (like “king” and “queen”) end up close together in this high-dimensional space. The “distance” between two vectors measures their dissimilarity. pgvector adds three new distance operators to PostgreSQL: <-> (Euclidean distance), <#> (negative inner product), and <=> (cosine distance). Cosine distance is often the go-to for text embeddings, as it’s concerned with the angle between vectors rather than their magnitude, which tends to work better for this use case.

Getting pgvector Up and Running

First, you need to install it. If you’re using a managed service like Amazon RDS or Google Cloud SQL, you can usually enable it with a couple of clicks in the console or a quick parameter group change. For a local Postgres instance, you’ll have to compile and install it. Once that’s done, enabling it in your database is the same familiar drill:

CREATE EXTENSION IF NOT EXISTS vector;

And just like that, you’re in business. No big ceremony.

Defining Your Vector Columns

The new vector data type is what you’ll use. You have to define its dimensions. Most models output a fixed size; OpenAI’s text-embedding-ada-002 model, for instance, spits out 1536 dimensions. So you’d define your column like:

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1536)
);

This is the first “gotcha.” You must get the dimension right. Inserting a 768-dimension vector into a 1536-dimension column will fail spectacularly. It’s not going to pad it with zeros for you; it’ll just throw an error. This is a good thing. It saves you from silent, horrific data corruption.

Inserting Data: No Magic, Just Arrays

Inserting is straightforward. Your AI model will output a Python list or a NumPy array, which you can easily serialize into a PostgreSQL array literal. Using a client library like psycopg2 in Python, it looks like this:

# Assume 'embedding_model' generates a list of 1536 floats
embedding = embedding_model("The quick brown fox jumps over the lazy dog.")

# Using psycopg2's execute method
cur.execute(
    "INSERT INTO documents (content, embedding) VALUES (%s, %s);",
    ("The quick brown fox...", embedding)  # psycopg2 handles the list-to-vector conversion
)

Querying for Similarity: The Whole Point

This is where the magic happens. You take your query—“canine companion”—generate an embedding for it, and then ask PostgreSQL: “show me the documents with embeddings closest to this one.”

SELECT
    id,
    content,
    embedding <=> $1 AS cosine_distance
FROM documents
ORDER BY cosine_distance
LIMIT 5;

You’d pass the embedding vector of “canine companion” as the parameter $1. This is a “nearest neighbor” search, or K-NN. It’s brutally slow if you have a large table because it has to calculate the distance between your query vector and every single row in the table. This is called a sequential scan, and it’s a recipe for a bad time once you have more than a few thousand vectors.

Indexing: Making It Actually Scalable

To avoid that full-table nightmare, pgvector provides a specialized index using the Hierarchical Navigable Small Worlds (HNSW) algorithm. It’s brilliant. It essentially creates a graph of your vectors, allowing the database to hop through this graph to find neighbors quickly without visiting every single point. Creating the index is an expensive operation, but it’s a one-time cost for a massive query speed-up.

CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Now, the WITH clause is where you see the engineers letting you tweak the knobs. m defines how many connections each node in the graph has. A higher value makes a more accurate but larger and slower-to-build index. ef_construction influences the index build time and accuracy. The default values are… fine. For a production system, you’ll want to benchmark with your actual data. The vector_cosine_ops part is crucial—it tells the index to use the cosine distance operators. There are equivalent ops for the other distance types (vector_l2_ops for Euclidean, vector_ip_ops for inner product).

Once the index is built, your ORDER BY ... LIMIT query will automatically use it and return results in milliseconds instead of seconds (or minutes).

The Rough Edges and Best Practices

Let’s be honest. This isn’t a polished, 20-year-old SQL standard. It’s a powerful tool with some sharp edges.

Indexing is Slow and Big: Building an HNSW index on a million-row table can take minutes and consume a significant amount of RAM. Plan for this. Do it during a maintenance window. The resulting index will also be large, often bigger than your table itself. This is the price of speed.
Choose Your Distance Metric Wisely: Don’t just default to cosine. Your model might be optimized for a specific distance metric. The choice between inner product and cosine can be subtle but important. Know your model’s output.
Filtering is Tricky: The biggest architectural challenge is combining vector similarity searches with traditional WHERE clauses. A query for “find documents about dogs that were created last week” is hard. The index only helps with the vector part. The best practice is to use the vector search as a first-pass filter to get, say, 1000 candidates, and then apply your other filters to those results. It’s not perfect, but it’s what we have. Some newer techniques involve partitioning your data to make this easier.