78.4 Signal Processing: FFT, Filtering, and Spectral Analysis

Right, let’s talk about making your data sing. Or at least making it stop screaming. Signal processing is how you take a raw, noisy, often infuriatingly messy signal from the real world and extract the information you actually care about. It’s the digital equivalent of tuning an old radio—you’re turning the knobs (applying filters) to bring the station (your signal) into focus and drown out the static (the noise).

We’ll use SciPy for the heavy-duty signal processing math because, frankly, it’s a beast. But since our signals often live in big, beautiful DataFrames, we’ll use Polars to manage them before we hand things off to SciPy’s algorithms. This is the classic one-two punch: Polars for fast, efficient data wrangling, SciPy for the rigorous numerical analysis.

The FFT: Seeing in the Frequency Domain

Your first and most powerful tool is the Fast Fourier Transform (FFT). Think of it like this: you’re looking at a complex sound wave—a chord played on a guitar. Your ears (and a microphone) hear it as a single, complicated pressure wave over time (the time domain). The FFT is your magical spectrograph glasses. Put them on, and suddenly you can see the individual notes (frequencies) that make up that chord, along with their volumes (amplitudes). It decomposes any signal into its constituent sinusoidal waves.

The absolute number one thing everyone gets wrong with the FFT is the sample rate. You must know how many data points you collect per second (Hz). Without it, your fancy frequency plot is just a pretty, meaningless shape.

Let’s generate a clean signal with some known frequencies and then use scipy.fft to crack it open.

import numpy as np
from scipy.fft import fft, fftfreq
import matplotlib.pyplot as plt

# Define our parameters. This is the stuff you MUST know from your data acquisition system.
sample_rate = 1000  # We sampled 1000 times per second (1000 Hz)
duration = 1  # seconds
N = sample_rate * duration  # total number of samples

# Generate a time vector. It's just a bunch of points in time.
t = np.linspace(0, duration, N, endpoint=False)

# Create a test signal: a 50 Hz sine wave and a 120 Hz sine wave, plus some noise.
# Why 50 and 120? Because I said so. They're nice, distinct frequencies.
signal = 0.7 * np.sin(2 * np.pi * 50 * t) + np.sin(2 * np.pi * 120 * t)
signal += 0.5 * np.random.randn(len(t))  # "randn" gives Gaussian (normal) noise

# Compute the FFT
yf = fft(signal)
xf = fftfreq(N, 1 / sample_rate)  # fftfreq gives us the corresponding frequencies

# Plot the noisy signal
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(t, signal)
plt.title("Time Domain Signal (Noisy)")
plt.xlabel('Time [sec]')

# Plot the FFT (we only care about the positive frequencies)
plt.subplot(1, 2, 2)
plt.plot(xf[:N//2], 2.0/N * np.abs(yf[0:N//2])) # Magnitude, scaled appropriately
plt.title("Frequency Domain (FFT)")
plt.xlabel('Frequency [Hz]')
plt.grid()
plt.tight_layout()
plt.show()

Run that. See those two sharp peaks poking through the noise at exactly 50 Hz and 120 Hz? That’s the FFT working its magic. The noise shows up as that low, squiggly baseline across all frequencies. The FFT didn’t remove the noise, but it did show us exactly what our signal is made of, which is the crucial first step.

Filter Design: Making Unwanted Frequencies Vanish

Knowing what frequencies are in your signal is cool. Choosing which ones get to stay is even cooler. That’s filtering. You’ll mostly use two types: Low-pass filters (allow low frequencies, block high ones; great for smoothing and removing high-frequency noise) and Band-pass filters (allow a specific range of frequencies, block the rest; great for isolating a specific signal).

SciPy’s signal module is our toolbox here. We design a filter by specifying its characteristics, get its coefficients, and then apply it. The most common workhorse is the Butterworth filter, because it has a nice flat response in the passband without ripples.

from scipy import signal

# Design a low-pass Butterworth filter to smooth our noisy signal
cutoff_freq = 80  # Hz (we want to keep frequencies below 80 Hz)
nyquist = sample_rate / 2  # The absolute maximum frequency we can detect
normal_cutoff = cutoff_freq / nyquist  # Normalize the cutoff to Nyquist

# Create a 4th-order Butterworth low-pass filter
b, a = signal.butter(4, normal_cutoff, btype='low', analog=False)

# Apply the filter to our signal
filtered_signal = signal.filtfilt(b, a, signal)

# Plot the original vs. the filtered
plt.figure(figsize=(10, 4))
plt.plot(t, signal, label='Noisy Original', alpha=0.7)
plt.plot(t, filtered_signal, label='Filtered (Low-Pass)', linewidth=2, color='k')
plt.legend()
plt.xlabel('Time [sec]')
plt.show()

Why filtfilt instead of lfilter? Ah, a fantastic question. lfilter applies the filter in one direction, which introduces a time lag or phase shift—your filtered signal will be slightly delayed. filtfilt applies the filter once forward and once backward, resulting in zero phase shift and a sharper frequency roll-off. It’s almost always what you want for offline processing. The tradeoff? It’s roughly twice the computational cost. A fair price for clean, unshifted data.

Spectral Analysis with Welch’s Method

Here’s the dirty secret of the beautiful FFT plot we made earlier: it’s kind of a liar. It presents a single, perfect view of the entire frequency spectrum. But for real, noisy, non-stationary signals, that’s a statistical fantasy. The plain FFT has high variance.

The professional’s choice is Welch’s method. Instead of taking one giant FFT of your entire million-point signal, it chops the signal into overlapping segments, computes the FFT for each segment, and then averages those spectra together. The result is a much smoother, more statistically reliable estimate of the power spectral density (PSD). It trades a bit of frequency resolution for a massive reduction in variance. It’s an excellent trade.

from scipy.signal import welch

# Compute the Welch PSD
frequencies, psd = welch(signal, sample_rate, nperseg=256)

plt.figure(figsize=(10, 4))
plt.semilogy(frequencies, psd)  # Often plotted on a log-y scale to see small details
plt.title("Welch's Power Spectral Density Estimate")
plt.xlabel('Frequency [Hz]')
plt.ylabel('Power / Frequency [dB/Hz]')
plt.grid()
plt.show()

See how much cleaner that looks compared to the raw FFT magnitude plot? The noise floor is a smooth line, and our 50 Hz and 120 Hz peaks stand out clearly. This is the plot you take to a conference. When someone asks you “what’s in your signal?”, you show them the Welch PSD. It says you know how to handle real data.

The key parameter here is nperseg—the number of samples per segment. A larger value gives you better frequency resolution (you can distinguish closer-together peaks) but noisier averages. A smaller value gives you a smoother average but blurs nearby frequencies together. You have to tune it for your specific signal. There’s no free lunch, only parameter tuning. Welcome to science.