75.1 ndarray: Creating, Reshaping, and Slicing
Right, let’s talk about the ndarray. It’s the heart, soul, and occasionally the frustratingly stubborn backbone of NumPy. Forget everything you think you know about Python lists. We’re not in Kansas anymore. This is a homogeneous, n-dimensional, contiguous block of memory designed for one thing: brutally efficient numerical computation. It’s a list that went to the gym, got a degree in mechanical engineering, and refuses to mess around.
Creating Arrays: Your First Real Step
You don’t build an ndarray; you summon it from the void of raw data. The main incantation is np.array(). The key thing to watch here is the dtype (data type). NumPy, in its quest for speed, needs to know exactly what kind of data it’s dealing with upfront.
import numpy as np
# From a humble list. NumPy looks at this and sighs at its inefficiency.
my_list = [1, 2, 3, 4, 5]
arr_from_list = np.array(my_list)
print(arr_from_list) # Output: [1 2 3 4 5]
print(arr_from_list.dtype) # Output: dtype('int64') (or int32, depending on your OS)
# Now, let's break its brain with mixed types. What *is* this, a Python list?
mixed_list = [1, 2.5, 'three']
arr_mixed = np.array(mixed_list)
print(arr_mixed) # Output: ['1' '2.5' 'three'] <-- Everything got cast to strings. Yikes.
print(arr_mixed.dtype) # Output: dtype('<U21') (Unicode string of max length 21)
See that? NumPy hates ambiguity. It found strings and said, “Fine, we’ll do it your way, but we’re doing it all the way.” To avoid this tragic comedy, be explicit.
# Be the adult in the room. Specify the dtype.
arr_proper = np.array([1, 2.5, 3], dtype=np.float64)
print(arr_proper) # Output: [1. 2.5 3. ]
For laziness (a virtue among programmers), use the helper functions. np.zeros(), np.ones(), and my personal favorite, np.full(), which is delightfully literal.
# A 3x4 grid of utter nothingness.
zeros = np.zeros((3, 4))
print(zeros)
# A 2x2 grid of pure, unadulterated 99s.
nines = np.full((2, 2), 99)
print(nines) # Output: [[99 99]
# [99 99]]
# `np.arange` is like range() on steroids. `np.linspace` is for when you want a specific NUMBER of points.
sequential = np.arange(0, 10, 2) # start, stop, step
print(sequential) # Output: [0 2 4 6 8]
spaced_out = np.linspace(0, 100, 5) # start, stop, num_points
print(spaced_out) # Output: [ 0. 25. 50. 75. 100.]
Reshaping: The Art of Data Origami
Your data rarely arrives in the perfect shape. Reshaping is how you fold it into submission. The crucial rule: the total number of elements must remain the same. You can’t turn 12 elements into a 5x3 array (15 elements); NumPy will quite rightly throw a tantrum.
arr = np.arange(12) # [ 0 1 2 3 4 5 6 7 8 9 10 11]
print(arr.shape) # Output: (12,) <-- That comma means it's a 1D array.
# Reshape it into a 3x4 matrix.
matrix = arr.reshape(3, 4)
print(matrix)
# Output:
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# The -1 trick is genius. It means "figure this dimension out for me."
# "Make this a 2D array with 6 columns, and you do the math on the rows."
auto_rows = arr.reshape(-1, 6)
print(auto_rows.shape) # Output: (2, 6)
# You can also go back to 1D with `flatten` (which always makes a copy) or `ravel` (which usually makes a view).
flattened_copy = matrix.flatten()
raveled_view = matrix.ravel()
Slicing: The Reason You’ll Never Go Back to Lists
This is where the magic happens. Slicing in NumPy is breathtakingly powerful and, for the uninitiated, a fantastic way to shoot yourself in the foot. The first thing to internalize: in most cases, slicing returns a view, not a copy.
A view is a new window into the exact same data. Change the view, and you change the original array. This is for performance; copying large chunks of memory all the time would be insanely slow. It’s also a classic pitfall.
arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
print(arr)
# Get a view of elements from index 2 to 5.
slice_of_arr = arr[2:6]
print(slice_of_arr) # Output: [2 3 4 5]
# Now change the first element of the SLICE.
slice_of_arr[0] = 999
# Look at the original array. It got mutated!
print(arr) # Output: [ 0 1 999 3 4 5 6 7 8 9]
If you need an actual, honest-to-goodness independent copy, you must be explicit.
arr = np.arange(10)
actual_copy = arr[2:6].copy() # This is the safe way.
actual_copy[0] = 999
print(arr) # Output: [0 1 2 3 4 5 6 7 8 9] <-- Unchanged. Safe.
For multi-dimensional arrays, you separate indices and slices with commas. And yes, you can use the ellipsis (...) to say “and all the dimensions in between.”
matrix = np.arange(16).reshape(4, 4)
print(matrix)
# Output:
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]
# [12 13 14 15]]
# Get the first two rows and all columns.
print(matrix[:2, :]) # Rows 0-1, all columns
# Get the second column for all rows.
print(matrix[:, 1]) # Output: [ 1 5 9 13] <-- This is 1D, note!
# Get a 2x2 block from the center.
block = matrix[1:3, 1:3]
print(block)
# Output:
# [[ 5 6]
# [ 9 10]]
# Fancy indexing: grab specific rows in a specific order.
print(matrix[[2, 0, 3]]) # Output: rows 2, then 0, then 3.
This view-versus-copy behavior is NumPy’s way of trusting you with power. It assumes you know what you’re doing. It’s a brilliant design for performance, but it’s the number one “gotcha” for newcomers. Master it, and you’ve mastered one of the most important concepts in the entire library. Now go reshape and slice something. Responsibly.