11.6 Nested Lists and Shallow vs Deep Copy
In Python, a nested list is a list that contains other lists as its elements. This structure is incredibly powerful for representing multi-dimensional data, such as matrices, grids, or hierarchical information. However, the mutable nature of list objects introduces a critical distinction when copying them: the difference between a shallow copy and a deep copy. Understanding this distinction is paramount to avoiding subtle and often frustrating bugs.
The Nature of Nested Lists
A nested list is not a special data type; it is simply a list whose elements are references to other list objects. When you create a list like outer = [[1, 2], [3, 4]], the variable outer holds a reference to a list container. The first element of that container is not the values [1, 2] themselves, but a reference (a pointer) to another, separate list object in memory that contains 1 and 2. This structure of references is what leads to the copying complexities.
# Creating a nested list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(matrix[1][0]) # Output: 4 (Accessing the first element of the second list)
The Pitfall of Assignment
A common mistake is attempting to create a copy of a list using a simple assignment. This does not create a new list; it merely creates a new reference (a new name) pointing to the exact same list object in memory. Consequently, any modification made through either variable will be reflected when accessing the data through the other variable.
original = [[1, 2], [3, 4]]
assigned_copy = original # This is NOT a copy
# Modifying the 'assigned_copy'...
assigned_copy[0][0] = 'A'
# ...also affects the 'original' because they are the same object.
print(original) # Output: [['A', 2], [3, 4]]
Creating a Shallow Copy
A shallow copy creates a new, outer list object but populates it with references to the same inner list objects as the original. The copy() method from the list class or slicing [:] are the most common ways to create a shallow copy.
original = [[1, 2], [3, 4]]
shallow_copy = original.copy() # Alternatively: shallow_copy = original[:]
# The outer list is a new object...
shallow_copy.append([5, 6])
print(original) # Output: [[1, 2], [3, 4]] (Unaffected by the append)
# ...but the inner lists are shared references.
shallow_copy[0][0] = 'A'
print(original) # Output: [['A', 2], [3, 4]] (The inner list was modified!)
The id() function reveals this relationship. The outer lists have different IDs, but the first inner list in both the original and the shallow copy share the same ID.
print(id(original) == id(shallow_copy)) # False (outer lists are different)
print(id(original[0]) == id(shallow_copy[0])) # True (inner lists are the same)
Creating a Deep Copy
To create a completely independent copy of a nested list, where all levels are new objects, you must perform a deep copy. The copy module provides the deepcopy() function for this exact purpose. It recursively traverses the entire data structure, creating new copies of every object it encounters.
import copy
original = [[1, 2], [3, 4]]
deep_copy = copy.deepcopy(original)
# Now, modifications to any level of the deep copy...
deep_copy[0][0] = 'A'
# ...leave the original completely untouched.
print(original) # Output: [[1, 2], [3, 4]]
print(deep_copy) # Output: [['A', 2], [3, 4]]
All objects, both the outer and inner lists, now have distinct identities.
print(id(original) == id(deep_copy)) # False
print(id(original[0]) == id(deep_copy[0])) # False
When to Use Shallow vs. Deep Copy
The choice between a shallow and deep copy is a choice between performance and safety.
- Use a Shallow Copy when the inner objects are immutable (e.g., integers, strings, tuples) or when you are certain you only need to modify the structure of the outer list (adding/removing elements) and will not mutate the inner objects. It is faster and more memory-efficient.
- Use a Deep Copy when the nested structure contains mutable objects (like other lists or dictionaries) and you need complete independence from the original, or when you are unsure of the data’s mutability. It is safer but computationally more expensive for large structures.
Best Practices and Summary
- Always be aware of mutability. The core of this issue stems from lists being mutable. This problem applies equally to other mutable collections like sets and dictionaries nested within lists.
- Prefer
copy.deepcopy()for safety. If your code’s correctness depends on the copied data being independent, default to using a deep copy. The performance cost is often negligible compared to the time spent debugging a shallow copy error. - Understand what you are copying. A shallow copy is sufficient if you are only changing the outer container (e.g.,
my_list.append(x)ormy_list.pop()). The moment you plan to change an element within a nested mutable element (e.g.,my_list[0][1] = 10), you must question if that element is shared and if you need a deep copy instead. - For very large or complex structures, consider if a deep copy is feasible or if an alternative design (e.g., using immutable data structures like tuples where possible) can avoid the need for copying altogether.