3.3 Dot Products, Norms, and Projections

Alright, let’s get our hands dirty with the real workhorses of linear algebra: dot products, norms, and projections. These aren’t just abstract mathematical curiosities; they are the fundamental tools that let AI models measure similarity, understand distance, and even “learn” by nudging things in the right direction. If you’ve ever used a recommendation system or seen a neural network classify an image, these concepts were working overtime under the hood.

What the Heck is a Dot Product, Really?

The dot product (or inner product) between two vectors is more than just a sum of multiplied elements. That’s the how, and it’s simple enough: for vectors u and v, it’s sum(u_i * v_i). But the why is where the magic is.

Geometrically, the dot product u · v tells you how much one vector “goes in the direction” of another. It’s a measure of alignment. Think of it like this: if you’re pushing a box across the floor, the work you do isn’t just about how hard you push (the magnitude of your force vector), it’s about how much of that push is actually in the direction of movement. The dot product captures that exact idea.

The formula even proves it: u · v = ||u|| ||v|| cos(θ), where θ is the angle between them. This is the key to unlocking its power:

If u and v point the same way (θ = 0°, cos(θ)=1), the dot product is positive and maximized. They’re best friends.
If they are perpendicular (θ = 90°, cos(θ)=0), the dot product is zero. They’re complete strangers; one has no component in the direction of the other.
If they point in opposite directions (θ = 180°, cos(θ)=-1), the dot product is negative and minimized. They’re arch-nemeses.

This is why it’s the go-to metric for similarity in everything from basic K-Nearest Neighbors models to sophisticated word embeddings like Word2Vec. A high dot product means the vectors are similar.

Let’s see it in code. NumPy makes this trivial, but I’m going to show you the manual way first so you feel what’s happening, then the smart way you’ll actually use.

# The "I want to understand it" way
def naive_dot_product(u, v):
    if len(u) != len(v):
        raise ValueError("Vectors must be the same length, because otherwise this is nonsense.")
    result = 0
    for i in range(len(u)):
        result += u[i] * v[i]
    return result

u = [1, 2, 3]
v = [4, 5, 6]
print(naive_dot_product(u, v)) # Output: 1*4 + 2*5 + 3*6 = 32

# The "I have work to do" way
import numpy as np

u_np = np.array([1, 2, 3])
v_np = np.array([4, 5, 6])
print(np.dot(u_np, v_np)) # Output: 32
# Or even better, using the slick @ operator:
print(u_np @ v_np) # Output: 32

Getting a Grip on Norms

If the dot product tells you about alignment, the norm tells you about size. The most common norm, the L2 norm (or Euclidean norm), is just the length of a vector. You’ve known it since Pythagoras: for a vector v, its L2 norm is ||v|| = sqrt(v_1² + v_2² + ... + v_n²).

Notice something? Look back at the geometric definition of the dot product: u · v = ||u|| ||v|| cos(θ). Now look at the dot product of a vector with itself: v · v = v_1² + v_2² + … + v_n². That’s ||v||²! So ||v|| = sqrt(v · v). The dot product intrinsically contains the concept of length. This isn’t a coincidence; it’s by design.

Norms are crucial for normalization. A vector’s direction often matters more than its length. To focus purely on direction, you create a unit vector—a vector of length 1—by dividing the vector by its norm: û = v / ||v||. This is called normalization, and it’s a preprocessing step you’ll do constantly.

v = np.array([3, 4])
v_norm = np.linalg.norm(v) # Calculates ||v|| = sqrt(3² + 4²) = 5
print("Norm of v:", v_norm) # Output: 5.0

v_hat = v / v_norm # Normalize to a unit vector
print("Unit vector:", v_hat) # Output: [0.6, 0.8]
print("Norm of unit vector:", np.linalg.norm(v_hat)) # Output: 1.0 (or very, very close due to floating-point)

Pitfall Alert: Always check for the zero vector before normalizing! Dividing by zero (||v|| = 0) will blow up your code. It’s the one vector that has no direction, so asking for it is philosophically and mathematically invalid.

Projection: The Art of Dropping Perpendiculars

Now let’s combine these ideas. Projection is the process of dropping a metaphorical perpendicular from one vector onto another. The projection of u onto v is the component of u that points in the direction of v. It’s the shadow u casts on v if the light source were directly above.

How do we find it? We use our two tools:

We know the length of the projection should be ||u|| cos(θ). From the dot product formula, that’s (u · v) / ||v||.
We want the projection as a vector. So we take that length and multiply it by the unit vector in the direction of v. This gives us the final formula:

proj_v(u) = [(u · v) / (v · v)] * v

The term (u · v) / (v · v) is a scalar that tells us how much of v we need. This is the workhorse behind concepts like least-squares regression, where you’re essentially projecting your data onto a line of best fit.

def project(u, v):
    """
    Projects vector u onto vector v.
    Returns the projection vector.
    """
    # u · v
    dot_product = np.dot(u, v)
    # v · v (which is ||v||^2)
    v_norm_squared = np.dot(v, v)

    # Avoid division by zero
    if v_norm_squared == 0:
        raise ValueError("Cannot project onto the zero vector. It has no direction.")

    # The scalar projection factor
    scalar_proj = dot_product / v_norm_squared
    # Multiply by v to get the vector projection
    return scalar_proj * v

# Example:
u = np.array([5, 6])
v = np.array([1, 0]) # The x-axis basis vector

projection = project(u, v)
print("Projection of u onto v:", projection) # Output: [5. 0.]
# Makes perfect sense: all of u's x-component, none of its y-component.

Why is this such a big deal? Because the vector connecting u to its projection, u - proj_v(u), is perpendicular to v. This minimal error is the foundation of most modern machine learning. The goal of training a model is often to minimize the “distance” (the norm of this error vector) between its predictions and reality. So every time you train a model, you’re essentially doing a gazillion projections. Now you know. You’re welcome.