29.5 Embeddings API: text-embedding-3 Models

Right, embeddings. This is where we stop just chatting with the model and start getting it to do real work. Forget the parlor tricks; this is the API’s workhorse. An embedding is essentially a mathematical fingerprint for a piece of text. It takes your words and translates them into a dense vector (just a long list of numbers) in a high-dimensional space. The magic is that semantically similar pieces of text end up close together in this space. “King” and “queen” are neighbors; “apple” and “fruit” are closer than “apple” and “truck.”

OpenAI’s text-embedding-3 models are the latest iteration of this, and they come with a party trick: you can shorten the vector. Why would you want to do that? We’ll get to it. First, let’s get our hands dirty.

Your First Embedding: It’s Just Numbers

Here’s the absolute baseline. You send text, you get back a list of numbers. It’s almost comically simple.

from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",  # We'll talk model choice in a sec
    input="The quick brown fox jumps over the lazy dog.",
)

# The embedding vector is here:
embedding = response.data[0].embedding
print(f"Got a vector with {len(embedding)} dimensions")
print(f"First 5 values: {embedding[:5]}")
# Output: Got a vector with 1536 dimensions
# First 5 values: [0.005348123546689749, -0.025032455891370773, -0.01457887515425682, -0.038050200790166855, 0.0820203572511673]

See? Not so scary. You’ve just converted a sentence into a point in a 1536-dimensional universe. The values themselves are meaningless; it’s their relative positions to other vectors that hold all the meaning.

The Big Choice: Model and Dimensions

OpenAI offers two main text-embedding-3 models: -small and -large. The names are a bit of a misnomer. It’s not just about size; it’s about a trade-off.

text-embedding-3-small: This is your new best friend. It’s brutally efficient, dirt cheap, and as of the latest benchmarks, it outperforms everything else in its class. For probably 95% of use cases, this is the one you want. It’s fast, accurate, and won’t burn a hole in your wallet. Its default vector size is 1536.
text-embedding-3-large: The heavyweight contender. It generates a default vector of 3072 dimensions. It’s more powerful, but also more expensive and slower. You pull this out when you have an extremely complex task and every ounce of performance matters, and you’re willing to pay for it.

Now, the clever bit: the dimensions parameter. Both models let you truncate their output vector.

# Getting a shorter vector from the large model
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="The quick brown fox jumps over the lazy dog.",
    dimensions=256  # Chop the 3072-dim vector down to 256
)

“Why on earth would I want less information?” I hear you cry. Two reasons: storage/cost and the curse of dimensionality. Storing a 256-dim vector is cheaper than a 1536-dim one, which matters when you’re dealing with millions of them. Furthermore, for many tasks like simple similarity search, a shorter, denser vector can sometimes perform better because it’s less sparse and noisy. It’s a way to tune the performance of your downstream application. The OpenAI team did their homework here; a shortened -large vector often outperforms a full -small one.

Normalization: The Secret Sauce of Similarity

You’ve got vectors. How do you compare them? You use a similarity measure, almost always cosine similarity. Cosine similarity cares about the direction of the vectors, not their magnitude. This is crucial because it means a long document won’t automatically be “dissimilar” to a short one just because its vector has larger values.

Here’s the pro tip: always normalize your vectors before storing them. It makes calculating cosine similarity a simple dot product, which is wildly efficient.

import numpy as np

# Get your embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog.",
)
vector = np.array(response.data[0].embedding)

# Normalize it to a unit vector
normalized_vector = vector / np.linalg.norm(vector)

# Now, cosine similarity between two vectors 'a' and 'b' is just:
# similarity = np.dot(a_normalized, b_normalized)

Store the normalized version. Thank me later when your similarity searches are blazing fast.

Common Pitfalls and How to Avoid Them

Batching is Non-Negotiable: Don’t embed one sentence at a time. The API accepts arrays for a reason. Batch your requests to avoid drowning in latency. The max batch size is 2048 elements, but I find batches of 100-500 to be the sweet spot for most applications.

texts = ["sentence one", "sentence two", ...] # your list of texts
responses = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts,  # Just pass the whole list
)
all_embeddings = [data.embedding for data in responses.data]

The Token Limit Trap: Remember, these are text models underneath. The text-embedding-3-* models have a context window of 8191 tokens. If you shove a massive document in, it will be silently truncated. Your options are to chunk your text or use a different strategy. For retrieval, embedding chunks is usually the right call anyway.
Apples to Apples: You must use the same model for all vectors you plan to compare. An embedding from -small and an embedding from -large exist in completely different mathematical universes. Comparing them is meaningless. Pick one model and stick with it for a given project.

The Embeddings API is the backbone of modern AI-powered search, recommendation, and clustering systems. It feels like magic, but it’s just good, solid, vector math. Use -small, normalize your vectors, batch your requests, and you’ll be building powerful stuff in no time.