3.1 Vectors, Matrices, and Tensors: The Language of ML
Right, let’s get this out of the way. You’re not here to learn about vectors in the abstract, geometric sense, like some arrow pointing into space from your high school physics class. In our world—the world of machine learning—vectors, matrices, and tensors are just data containers. They’re the fundamental structures we use to shove numbers into a model’s mouth. A vector is a list of numbers. A matrix is a list of lists of numbers. A tensor is just a fancy, multi-dimensional array of numbers (and yes, a vector is a 1D tensor, a matrix is a 2D tensor; don’t let anyone make it sound more mystical than that).
The real magic isn’t the container itself, but the brutally efficient math we can perform on it. Modern hardware, especially GPUs, are built to do the same simple operation (like addition or multiplication) on thousands, even millions, of these numbers simultaneously. This is called parallelization, and it’s the engine that makes modern AI possible. Without these structures, we’d be stuck writing for loops until the heat death of the universe.
The Humble Vector: Your First Fancy List
Think of a vector as a single point in a multi-dimensional space. A vector with two numbers [3, 4] is a point on a 2D plane. One with three numbers [1, 2, 5] is a point in 3D space. Your model’s input? A 784-dimensional point representing a flattened MNIST digit. Yeah, we can’t visualize that, but the math doesn’t care.
The two most common operations you’ll do are element-wise operations and the dot product.
Element-wise is easy: just add or multiply each corresponding element. The dot product, however, is where the secret sauce is. It’s a single number that measures the magnitude of one vector in the direction of another. Geometrically, it tells you about the angle between them. If the dot product is zero, they’re perpendicular. In ML, it’s the absolute workhorse for everything from calculating weighted sums in a neuron (w·x + b) to measuring similarity.
import numpy as np
# Define two vectors
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])
# Element-wise addition
print("v + w =", v + w) # Output: [5 7 9]
# Dot product (projection of v onto w, or vice versa)
dot_product = np.dot(v, w)
print("v · w =", dot_product) # Output: 1*4 + 2*5 + 3*6 = 32
Matrices: Where the Party Gets Loud
If a vector is a point, a matrix is a transformation. It’s a whole set of rules for how to move, rotate, and squish vector space. Every layer of a neural network applies a matrix transformation (via a weight matrix) to its input vector.
The most important operation here is matrix multiplication. It’s not element-wise. To multiply matrices A and B, the number of columns in A must equal the number of rows in B. The result is a new matrix where each element is the dot product of a row from A and a column from B. It’s the composition of two transformations. This rule is non-negotiable, and getting the shapes right is 80% of your debugging pain early on.
# A is 2x3, B is 3x2 -> Result will be 2x2
A = np.array([[1, 2, 3],
[4, 5, 6]])
B = np.array([[7, 8],
[9, 10],
[11, 12]])
# Matrix multiplication
C = np.matmul(A, B) # Or use the operator @, like A @ B
print("A @ B =\n", C)
# Output:
# [[ 58 64] # (1*7 + 2*9 + 3*11), (1*8 + 2*10 + 3*12)
# [139 154]] # (4*7 + 5*9 + 6*11), (4*8 + 5*10 + 6*12)
Pitfall #1: Shape Mismatch. This will be the cause of 90% of your linear algebra errors. Always, always know the shape of your tensors. print(x.shape) is your best friend. If you try to multiply a (5, 3) matrix by a (5, 2) matrix, NumPy will rightly throw a fit. They’re incompatible.
Tensors: The Multidimensional Rabbit Hole
A tensor is just a generalization of these concepts. A 0D tensor is a scalar (a single number), a 1D tensor is a vector, a 2D tensor is a matrix, a 3D tensor is a “cube” of numbers, and so on. In deep learning, we constantly work with high-dimensional tensors.
For example, a batch of data is almost always represented as a tensor. A batch of 32 color images, each 64px by 64px, would be a 4D tensor of shape (32, 64, 64, 3). The axes are (batch_size, height, width, channels).
# Simulate a batch of 32 grayscale 28x28 images (like MNIST)
batch_of_images = np.random.rand(32, 28, 28, 1)
print("Tensor shape:", batch_of_images.shape) # Output: (32, 28, 28, 1)
# We can perform operations on the entire batch at once.
# Let's normalize each pixel value to between 0 and 1 (it already is here, but you get the point)
normalized_batch = batch_of_images / 255.0
# This operation is broadcast across all 32, 28, 28, and 1 elements. No loops needed.
Pitfall #2: Broadcasting. This is NumPy’s (and PyTorch/TF’s) amazing but sometimes confusing feature where it tries to be “helpful” and perform element-wise operations on tensors of different shapes by automatically expanding the smaller tensor. It follows strict rules, but if you don’t understand them, you’ll get silently wrong results instead of a clear error. Always double-check that the operation is being applied to the dimensions you think it is.
The Truth About Implementation
Here’s the part the pure math books leave out: on a computer, all of these fancy tensors are just contiguous blocks in memory. The shape attribute is just a convenient fiction, a way of telling the library how to interpret that block. A 2x3 matrix is just six numbers in a row, and the library knows to jump to the next “row” every three elements. This is why operations on slices can be tricky (they might be “views” instead of copies) and why reshaping is usually a blindingly fast, free operation—it just changes the interpretation of the memory block, not the data itself. It’s a brilliant, efficient hack that we get to build upon.