3.2 Matrix Operations: Multiplication, Transpose, Inverse, Eigendecomposition
Right, let’s talk about the things you’ll actually do with matrices. You’ve got these grids of numbers, and you can add them, which is delightfully sane. But multiplication? That’s where the designers of our universe decided to get weird.
Matrix multiplication isn’t just multiplying each element by its corresponding partner. That operation exists; it’s called the Hadamard product, and we almost never use it. No, true matrix multiplication is a more profound, and frankly, more useful beast. It’s the mathematical embodiment of composing linear transformations. If I have a matrix A that rotates a vector, and a matrix B that scales it, then A @ B (in Python parlance) gives me a new matrix that does the scaling then the rotation, in one step. The key rule: the number of columns in the first matrix must equal the number of rows in the second. If A is (m x n) and B is (n x p), then the result C is (m x p). The element C[i, j] is the dot product of the i-th row of A and the j-th column of B.
Matrix Multiplication: It’s All About the Dot Products
This is the core of it. You’re systematically taking every row of the first guy and shaking hands with every column of the second guy. Let’s see it in code because reading the definition is a surefire way to fall asleep.
import numpy as np
# Let's say we have a matrix for a transformation (2x3)
A = np.array([[1, 2, 3],
[4, 5, 6]])
# And a matrix of data points (3x2). 3 features, 2 samples.
B = np.array([[7, 8],
[9, 10],
[11, 12]])
# The result will be a (2x2) matrix.
# C[0,0] = (1*7) + (2*9) + (3*11) = 7 + 18 + 33 = 58
# C[0,1] = (1*8) + (2*10) + (3*12) = 8 + 20 + 36 = 64
# ...and so on.
C = A @ B # The modern, clean way. Use this.
# Or: C = np.matmul(A, B)
# Or, the terrible way that will confuse you: C = np.dot(A, B) # Avoid for matrices.
print(C)
# Output:
# [[ 58 64]
# [139 154]]
The most common pitfall? Getting the shapes wrong. (3x2) @ (3x2) will fail miserably. You need (m x n) @ (n x p). Remember that. It will save you hours of debugging.
The Transpose: Flipping It (But Not Real Life)
The transpose, denoted A.T, is hilariously simple: you flip the rows and columns. Row i becomes column i. It’s like turning your head sideways to look at the matrix. Why do we care? It’s essential for making the math work. For example, if you have a column vector of data points x (shape (n, 1)), and you want to multiply it by a matrix W (shape (m, n)), you need x to be a row vector for the dimensions to align (W @ x). But it’s stored as a column! So you use the transpose to make it a row vector (x.T)… except you can’t multiply (1, n) by (m, n). See? This is why it gets confusing. You actually need (W @ x.T) if x.T is (1, n), but then the result is a row vector. It’s a mess. In practice, we just keep our data points as rows in a big matrix X, which is why you’ll often see the equation written as X @ W.T instead. The transpose is our duct tape for fixing dimension mismatches.
# Let's fix the dimension issue from above.
x = np.array([[1], [2], [3]]) # A column vector, shape (3, 1)
W = np.array([[1, 2, 3],
[4, 5, 6]]) # A matrix, shape (2, 3)
# We want W @ x, but (2,3) @ (3,1) works! It gives a (2,1) column vector.
result = W @ x
print("W @ x:\n", result)
# But if our data was stored as a row vector (1,3)...
x_row = x.T # shape (1, 3)
# Now we can't do W @ x_row because (2,3) @ (1,3) is invalid.
# We need to transpose W to (3,2) first? This is awkward.
# Instead, we do: x_row @ W.T because (1,3) @ (3,2) is valid and gives (1,2)
result_2 = x_row @ W.T
print("x_row @ W.T:\n", result_2) # Same values, but as a row vector.
The Inverse: The “Undo” Button (When It Exists)
If I have a matrix A that applies a transformation, its inverse, A^{-1}, is the matrix that reverses it. A @ A^{-1} = I, the identity matrix (the matrix equivalent of the number 1). This is monumentally useful for solving systems of equations. But here’s the catch: not every matrix has an inverse. A matrix must be square (n x n) and have full rank (meaning its columns/rows are linearly independent). If the matrix squishes space into a lower dimension (imagine flattening a 3D shape into a 2D plane), that operation is irreversible. You can’t un-flatten it. There’s no inverse. NumPy will be very unhappy if you try.
# A good matrix
A = np.array([[4, 7],
[2, 6]])
A_inv = np.linalg.inv(A)
print("Inverse of A:\n", A_inv)
# Verify it works
I = A @ A_inv
print("A @ A_inv (should be ~I):\n", np.round(I, 10)) # Round to avoid tiny floating-point errors
# A bad matrix (the second row is just 2x the first row)
B = np.array([[1, 2],
[2, 4]]) # This squishes 2D space onto a line.
try:
np.linalg.inv(B)
except np.linalg.LinAlgError as e:
print(f"Rightfully exploded: {e}")
Eigendecomposition: The True Nature Revealed
This is the crown jewel. If a matrix is a transformation, the eigendecomposition tells you what that transformation really is, at its heart. It finds the special vectors—eigenvectors—that don’t get knocked off their span by the transformation. They only get stretched or squished by a factor—the eigenvalue. So A @ v = λ * v. For a transformation, these are the fundamental “directions” of change.
Why is this the secret weapon of AI? Imagine your data. The eigenvectors of its covariance matrix point in the directions of maximum variance—the directions that contain the most information. The eigenvalues tell you how much information is in each direction. This is the absolute core of Principal Component Analysis (PCA), which is just a fancy name for “let’s look at the data using its eigendecomposition so we can reduce dimensions without losing the good stuff.”
# Let's create a covariance matrix of some fake data
np.random.seed(42)
data = np.random.randn(100, 2) # 100 points in 2D
# Make it correlated
data[:, 1] = 0.5 * data[:, 0] + 0.5 * data[:, 1]
cov_matrix = np.cov(data, rowvar=False) # Variables are columns
print("Covariance matrix:\n", cov_matrix)
# Perform eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
# The eigenvector with the largest eigenvalue is the direction of greatest variance.
# We can project our data onto this new, more informative basis.
principal_component = data @ eigenvectors[:, 0] # Project onto top eigenvector
print("Projected data shape:", principal_component.shape)
The big “gotcha” here is that eigendecomposition only works reliably on square matrices, and specifically on symmetric matrices (like covariance matrices) where you’re guaranteed real eigenvalues and orthogonal eigenvectors. For the weird, non-symmetric ones, things get complex. Literally.