51.7 shutil: Copying, Moving, and Deleting Trees
The shutil module, short for “shell utilities,” is an indispensable part of Python’s standard library for high-level file operations. While pathlib and os excel at path manipulation and low-level system calls, shutil provides a suite of functions designed for everyday tasks like copying files, moving directories, and recursively deleting entire folder structures. It abstracts away the complexities and platform-specific nuances of these operations, offering a clean, Pythonic interface.
Copying Files and Metadata
The most basic operation is copying a single file. shutil.copy2(src, dst) is the preferred function for this task. It copies the file’s content and its metadata, including timestamps (creation and modification) and permissions. This is in contrast to shutil.copy(src, dst), which only copies the data and the file’s permission mode. The dst can be a target directory or a full path to a new filename.
import shutil
from pathlib import Path
source_file = Path('data.txt')
dest_dir = Path('/backups/')
# Copy to a directory (keeps original filename 'data.txt')
shutil.copy2(source_file, dest_dir)
# Copy and rename the file in the destination
shutil.copy2(source_file, dest_dir / 'data_backup.txt')
# Demonstrating the difference between copy and copy2
import os
shutil.copy(source_file, 'copy.txt') # Copies data and permissions
shutil.copy2(source_file, 'copy2.txt') # Copies data, permissions, *and* timestamps
print("Original modified time:", os.path.getmtime('data.txt'))
print("copy() modified time: ", os.path.getmtime('copy.txt')) # Time of copy operation
print("copy2() modified time: ", os.path.getmtime('copy2.txt')) # Matches original
Copying Directories Recursively
To duplicate an entire directory tree, you use shutil.copytree(src, dst). This function creates a new destination directory and recursively copies every file and subdirectory from the source into it. A crucial feature is its dirs_exist_ok parameter (introduced in Python 3.8). By default, copytree requires the destination to not exist, preventing accidental overwrites. Setting dirs_exist_ok=True allows the function to merge the contents of the source into an existing destination directory.
import shutil
# This will fail if '/backups/my_project' already exists
try:
shutil.copytree('./my_project', '/backups/my_project')
except FileExistsError as e:
print(f"Error: {e}")
# The safe, modern way: allow merging with an existing directory
shutil.copytree('./my_project', '/backups/my_project', dirs_exist_ok=True)
# A useful pattern is to combine it with pathlib for path resolution
source_dir = Path('./reports/q3')
dest_dir = Path('/archive/2023/reports')
dest_dir.mkdir(parents=True, exist_ok=True) # Ensure the parent exists
shutil.copytree(source_dir, dest_dir / 'q3', dirs_exist_ok=True)
Moving Files and Directories
The shutil.move(src, dst) function is used to relocate files and directories. Its behavior is analogous to the Unix mv command. If the destination is a directory, the source is moved into it. If the destination is a file path or doesn’t exist, the source is effectively renamed. Crucially, if the destination is on the same filesystem, move will use an atomic os.rename() operation, which is very fast. If the destination is on a different filesystem, shutil must fall back to a slower copy-and-delete process.
# Move a file into a directory
shutil.move('report.pdf', '/archive/2023/')
# Rename a file (moving it to a new name in the same directory)
shutil.move('old_name.txt', 'new_name.txt')
# Move and rename a file in one operation
shutil.move('document.txt', '/archive/2023/renamed_doc.txt')
Deleting Entire Directory Trees
The most dangerous and powerful function in shutil is shutil.rmtree(path). It deletes a directory and all of its contents, recursively. There is no built-in trash/recycle bin functionality; this operation is permanent and immediate. This is why it is one of the most common sources of catastrophic bugs. Always double and triple-check the path you are passing to this function. A best practice is to use pathlib.Path objects to construct paths safely and to print the path for visual confirmation before execution, especially in scripts.
import shutil
from pathlib import Path
# DANGER: Permanently deletes '/tmp/build_cache' and everything inside it.
cache_dir = Path('/tmp/build_cache')
# BEST PRACTICE: Add a safety check before calling rmtree.
if cache_dir.exists() and cache_dir.is_dir():
print(f"About to permanently delete: {cache_dir.resolve()}")
confirm = input("Type 'YES' to confirm: ")
if confirm == 'YES':
shutil.rmtree(cache_dir)
print("Directory deleted.")
else:
print("Deletion cancelled.")
else:
print("Directory does not exist.")
Common Pitfalls and Best Practices
- Race Conditions: Be aware that between checking for a path’s existence (with
.exists()) and performing an operation on it, another process could delete or create it. Handle exceptions (FileNotFoundError,PermissionError) instead of relying solely on checks. - Symbolic Links: By default,
copytreewill follow symbolic links and copy the files they point to. This can lead to unintended data duplication or copying massive directory structures. Use thesymlinks=Trueparameter to preserve the links instead. - Metadata and Permissions: If preserving all metadata is critical, use
copy2andcopytree(copy_function=shutil.copy2). Be mindful that copying files to a different filesystem might not preserve all metadata (like creation time on some systems) due to OS limitations. - Error Handling: File operations can fail for many reasons: missing files, permission issues, or full disks. Always wrap
shutiloperations intry...exceptblocks to handle these scenarios gracefully rather than letting your script crash. - The Nuclear Option (
rmtree): As a final safety measure, consider using a dedicated library likesend2trash(which requires installation) for operations that should be reversible, especially in user-facing applications.