50.2 Capturing stdout and stderr

When executing external commands, capturing their standard output (stdout) and standard error (stderr) is a fundamental requirement for programmatic interaction. The subprocess module provides several powerful and nuanced methods to achieve this, each with distinct use cases and implications.

The `subprocess.run()` Function and `stdout`/`stderr` Arguments

The primary method for capturing output is through the stdout and stderr arguments of subprocess.run(). These arguments accept several constants that define the handling of these streams.

subprocess.PIPE: This is the most common choice for capture. It instructs the run() function to create a pipe between your Python program and the new process. The process’s output is collected into the corresponding attribute (e.g., result.stdout) of the returned CompletedProcess object.

import subprocess

# Capture both stdout and stderr into separate attributes
result = subprocess.run(['ls', '-l', '/nonexistent'], 
                      stdout=subprocess.PIPE, 
                      stderr=subprocess.PIPE,
                      text=True)  # Decode bytes to string automatically

print(f"Return code: {result.returncode}")
print(f"STDOUT:\n{result.stdout}")  # This will likely be empty for this command
print(f"STDERR:\n{result.stderr}")  # This will contain the 'No such file or directory' error

subprocess.DEVNULL: A special value that tells the subprocess to immediately discard any output sent to that stream. This is useful when you want to suppress output entirely, such as for a noisy command whose output you do not care about.

# Run a command and completely ignore both its output and errors
result = subprocess.run(['curl', '--silent', 'https://example.com'],
                      stdout=subprocess.DEVNULL,
                      stderr=subprocess.DEVNULL)
print("Command executed, output discarded.")

subprocess.STDOUT: A special value used only for the stderr argument. It tells the subprocess to redirect its stderr stream to its stdout stream. This allows you to capture both standard output and error messages intermixed in a single stream, result.stdout.

# Redirect stderr to stdout, capturing both in a single stream
result = subprocess.run(['ls', '-l', '/nonexistent', '/tmp'],
                      stdout=subprocess.PIPE,
                      stderr=subprocess.STDOUT,  # Critical: redirects stderr to stdout
                      text=True)

print(f"Combined output:\n{result.stdout}")
# The output will contain both the listing for /tmp and the error for /nonexistent

Understanding the `text` Argument and Encoding

A crucial and often confusing aspect is the difference between bytes and strings. By default, the PIPE captures output as raw bytes (bytes objects). This is the safest default because it preserves the exact output of the command, regardless of its encoding. The text=True argument (or universal_newlines=True in older Python versions) instructs subprocess to decode those bytes into strings using the default system encoding (which can be overridden with the encoding parameter).

Why this matters: If you try to manipulate a bytes object as a string (e.g., result.stdout.splitlines()) without setting text=True, you will get a TypeError. Always decide consciously: use text=True for text-based processing or work directly with bytes for binary data.

# Default behavior: capture as bytes
result_bytes = subprocess.run(['echo', 'hello'], stdout=subprocess.PIPE)
print(type(result_bytes.stdout))  # <class 'bytes'>
print(result_bytes.stdout)        # b'hello\n'

# Using text=True: capture as string
result_str = subprocess.run(['echo', 'hello'], stdout=subprocess.PIPE, text=True)
print(type(result_str.stdout))    # <class 'str'>
print(result_str.stdout)          # 'hello\n'

The Risk of Deadlocks and How to Avoid Them

A significant pitfall when using PIPE is the potential for a deadlock. This occurs when the parent process (your Python script) and the child process wait indefinitely for each other. The most common scenario is when both stdout=PIPE and stderr=PIPE are used, and the child process writes a large amount of data to stderr. The OS pipe buffers fill up, and the child process blocks, waiting for the parent to read from the stderr pipe. Meanwhile, the parent is waiting for the child to finish and close its stdout pipe before it reads the stderr data. Both processes are stuck waiting.

Solution: The subprocess.run() function with PIPE internally manages the reading of both pipes and waits for the process to terminate, effectively avoiding this deadlock for you. This is a major advantage over the older Popen interface.

However, if you are using the lower-level Popen object directly for more complex interactions, you must manage the pipes yourself to prevent deadlocks. This often involves using threads or the select module to read from stdout and stderr as data becomes available.

# This deadlock risk is handled automatically by run(), but is a real danger with Popen.
# The safe way with Popen requires more complex code.
from subprocess import Popen, PIPE

# UNSAFE with large output: can deadlock
proc = Popen(['command', 'that', 'outputs', 'a', 'lot'], stdout=PIPE, stderr=PIPE)
stdout, stderr = proc.communicate()  # communicate() is the safe way to read with Popen

Best Practices and Common Pitfalls

Check the Return Code: Always check result.returncode after running a command. A zero typically indicates success, while a non-zero value indicates an error. Relying solely on the presence of output in stderr is unreliable, as some successful commands may write warnings to stderr.

Beware of Large Output: Using PIPE captures the entire output of the command in memory. For commands that can produce gigabytes of output, this can exhaust your system’s memory. In such cases, consider redirecting the output directly to a file using the stdout and stderr arguments.

# Redirect output directly to files to avoid memory issues with large data
with open('stdout.log', 'w') as out_file, open('stderr.log', 'w') as err_file:
    result = subprocess.run(['dd', 'if=/dev/zero', 'bs=1M', 'count=1000'],
                          stdout=out_file,
                          stderr=err_file,
                          text=True)

Use check=True for Automatic Failure: If a non-zero return code should be treated as an exception, use check=True. This will cause a CalledProcessError to be raised if the command fails, which often simplifies error handling logic.
```
try:
    result = subprocess.run(['false'], check=True, stdout=subprocess.PIPE, text=True)
except subprocess.CalledProcessError as e:
    print(f"Command failed with return code {e.returncode}")
```

The subprocess.run() Function and stdout/stderr Arguments

Understanding the text Argument and Encoding

The Risk of Deadlocks and How to Avoid Them

Best Practices and Common Pitfalls

The `subprocess.run()` Function and `stdout`/`stderr` Arguments

Understanding the `text` Argument and Encoding