75.7 Random Number Generation: numpy.random

Right, let’s talk about making stuff up. Not in a dishonest way, but in the foundational, “we-need-fake-data-to-test-real-things” way. That’s what numpy.random is for. It’s your one-stop shop for generating arrays of random numbers, and it’s one of those parts of NumPy you’ll use constantly for everything from prototyping a machine learning model to running a Monte Carlo simulation. It’s deceptively simple on the surface, but there’s a critical, modern nuance you absolutely must understand from the start, or you’ll accidentally build irreproducible, non-portable code. And we are not about that life.

The First and Most Important Lesson: The Generator Object

For years, NumPy had a straightforward, global-state way of doing things: functions like np.random.rand(). You called them, and they magically produced numbers. This was convenient but also a bit of a nightmare. The entire module shared one hidden state. If some other part of your code (or another library) used an old function, it would silently change the sequence of numbers you got later in your code. This made debugging stochastic programs a special kind of hell.

The modern, correct way to do this is to use an explicit generator object. Think of it as your own personal random number dispenser. You create it once, and then you use it for everything. This isolates your randomness, making your code predictable, reproducible, and thread-safe.

import numpy as np

# This is the way. Create your own generator.
rng = np.random.default_rng()  # 'rng' is a common abbreviation

# Now use it for everything.
random_floats = rng.random(5)  # 5 numbers in [0.0, 1.0)
random_ints = rng.integers(0, 10, size=5)  # 5 integers from [0, 10)
print("Floats:", random_floats)
print("Ints:  ", random_ints)

Floats: [0.12345678 0.78978978 0.34534534 0.9019019  0.56756757]  # Your numbers will differ!
Ints:   [3 7 2 9 4]

The key here is default_rng(). Without an argument, it seeds itself from the system’s entropy source (like os.urandom), which is great for production. But for testing and debugging, you need reproducibility.

Seeding for Reproducibility and Debugging

The whole point of science and engineering is that someone else should be able to get the exact same result you did. With randomness, you do this by seeding your generator. A seed is a number that initializes the generator’s internal state. The same seed produces the same sequence of numbers every time. It’s like giving your “random” story the same first sentence.

# Create two generators with the same seed. They are clones.
deterministic_rng_1 = np.random.default_rng(seed=42)
deterministic_rng_2 = np.random.default_rng(seed=42)

# They will produce identical sequences.
seq1 = deterministic_rng_1.random(3)
seq2 = deterministic_rng_2.random(3)

print("Sequence 1:", seq1)
print("Sequence 2:", seq2)
print("Are they identical?", np.array_equal(seq1, seq2))

Sequence 1: [0.77395605 0.43887844 0.85859792]
Sequence 2: [0.77395605 0.43887844 0.85859792]
Are they identical? True

This is non-negotiable for debugging. If your model fails mysteriously one in a hundred runs, you can capture the seed that caused the failure, restart your script with that seed, and step through the code with a debugger watching the exact same “random” events unfold. It turns a nightmare into a manageable problem.

The Old Global Way (And Why You Should Avoid It)

I have to show you the old way because you’ll see it in a million StackOverflow answers and legacy codebases. It uses the module’s hidden global state.

# The old, frowned-upon way
np.random.seed(42)  # Sets the seed for the hidden global generator
old_style_numbers = np.random.rand(3)
print("Old style:", old_style_numbers)

Do not do this in new code. It’s legacy. It’s less efficient. And as I mentioned, it’s a global variable, which is just asking for subtle, mind-bending bugs. The generator object approach is superior in every way. Consider the old way deprecated.

Drawing from Different Distributions

The Generator object has a whole menu of distributions beyond uniform floats and integers. The naming is usually very clear.

rng = np.random.default_rng(seed=123)  # Let's get a fresh one

# Normal distribution: location (mean) and scale (standard deviation)
normal_vals = rng.normal(loc=100, scale=15, size=5)
print("Normal (mean=100, std=15):", normal_vals)

# Choice: randomly pick from a given array (with or without replacement)
my_array = ['win', 'lose', 'draw']
choices = rng.choice(my_array, size=10, p=[0.5, 0.3, 0.2])  # weighted probabilities!
print("Choices:", choices)

# Shuffle: modifies an array in-place. Careful now!
deck = np.arange(52)  # A deck of cards 0 to 51
rng.shuffle(deck)
print("Shuffled deck (first 10):", deck[:10])

Advanced Usage: Permutations and Random State

Two more incredibly useful tricks. permutation is like shuffle but it returns a shuffled copy instead of messing with your original data, which is usually what you want. It’s safer.

original = np.array([10, 20, 30, 40])
shuffled_copy = rng.permutation(original)
print("Original:", original)  # Untouched!
print("Shuffled copy:", shuffled_copy)

Finally, for advanced use cases like writing your own NumPy-accelerated functions, you can accept a rng object as an argument. This is a best practice. It lets the caller control the randomness.

def my_random_algorithm(data, rng=None):
    if rng is None:
        rng = np.random.default_rng()  # Create one if none provided
    # ... use rng for all random operations inside ...
    random_index = rng.integers(0, len(data))
    return data[random_index]

# Now the caller can pass their own seeded generator for reproducibility.
result = my_random_algorithm([1, 2, 3, 4, 5], rng=np.random.default_rng(seed=999))

The takeaway: np.random.default_rng() is your new best friend. Use it, seed it for testing, and pass it around. It gives you control over the chaos, which is pretty much the entire job of an engineer.