52.2 CSV: csv.reader, csv.writer, DictReader, DictWriter
The Comma-Separated Values (CSV) format is a deceptively simple text format for tabular data. Its lack of a formal standard has led to numerous dialects, making robust parsing non-trivial. Python’s csv module provides a powerful toolkit to handle these complexities, abstracting away the tedious details of string splitting and manual escaping. The module’s primary philosophy is to operate on sequences—most commonly, lists and dictionaries—treating file objects as its conduit.
The csv.reader Object
The csv.reader object is the foundational tool for reading CSV data. It takes an iterable (like a file object) and returns a reader object that itself iterates over the rows in the given CSV file, presenting each row as a list of strings.
A critical feature is the dialect parameter, which encapsulates a set of formatting rules. While you can define a custom dialect, the module provides sensible defaults that handle most cases. The reader automatically handles quoted fields, which can contain the delimiter or newline characters, a common point of failure for naive str.split() approaches.
import csv
# Sample CSV data with a quoted field containing a comma
csv_data = '''Name,Department,Quote
Alice,Engineering,"I love Python, it's great!"
Bob,Marketing,"A simple quote"
Carol,IT,"A quote
with a newline"'''
# Create a file-like object from the string
from io import StringIO
file = StringIO(csv_data)
reader = csv.reader(file)
for row in reader:
print(row)
# Output:
# ['Name', 'Department', 'Quote']
# ['Alice', 'Engineering', 'I love Python, it\'s great!']
# ['Bob', 'Marketing', 'A simple quote']
# ['Carol', 'IT', 'A quote\nwith a newline']
Notice how the comma within the quoted string in Alice’s row was not treated as a delimiter, and the newline in Carol’s row was preserved correctly. This automatic handling of field quoting is the primary reason to use the csv module over manual parsing.
The csv.writer Object
The counterpart to csv.reader is csv.writer. It writes rows (iterables of strings or numbers) to a file object, applying the necessary formatting rules. Its most crucial method is writerow(), which takes a single iterable, and writerows(), which takes an iterable of iterables.
A key best practice is to open the file with newline='' as shown below. This is essential on platforms like Windows to prevent the underlying OS from adding extraneous newline characters and corrupting the output file. The csv.writer manages all newlines itself.
import csv
data_to_write = [
['Name', 'Age', 'City'],
['Dave', '28', 'Boston'],
['Eve', '32', 'San Francisco'],
['Frank', '45', 'Austin, TX'] # This field will be quoted automatically
]
with open('people.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerows(data_to_write)
# Contents of people.csv:
# Name,Age,City
# Dave,28,Boston
# Eve,32,San Francisco
# Frank,45,"Austin, TX"
The writer correctly identified that “Austin, TX” contained the delimiter and automatically quoted the field, ensuring the file can be read back unambiguously.
DictReader and DictWriter for Enhanced Clarity
While the basic reader and writer use lists, DictReader and DictWriter use dictionaries, mapping the row data to the column headers. This makes code significantly more readable and maintainable, as you access fields by name (row['Age']) rather than by a potentially magic index (row[1]).
DictReader treats the first row of the CSV file as the dictionary keys by default, though this can be overridden with the fieldnames parameter.
import csv
with open('people.csv', 'r', newline='', encoding='utf-8') as file:
reader = csv.DictReader(file)
for row in reader:
print(f"{row['Name']} is {row['Age']} years old.")
# Output:
# Dave is 28 years old.
# Eve is 32 years old.
# Frank is 45 years old.
DictWriter requires the fieldnames parameter to define the structure and order of the columns. The writeheader() method is used to write the column names as the first row.
import csv
new_people = [
{'Name': 'Grace', 'Age': '29', 'City': 'Portland'},
{'Name': 'Heidi', 'City': 'Chicago', 'Age': '41'} # Note different order
]
fieldnames = ['Name', 'Age', 'City'] # Defines column order
with open('new_people.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.DictWriter(file, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(new_people)
# Contents of new_people.csv:
# Name,Age,City
# Grace,29,Portland
# Heidi,41,Chicago
The DictWriter correctly ordered the fields for Heidi’s row based on the fieldnames list, not the order of the dictionary keys.
Common Pitfalls and Best Practices
A major pitfall is forgetting the newline='' parameter when opening files for the csv.writer or DictWriter on Windows, which can result in double-spaced rows. Always use it for cross-platform compatibility.
Handling non-ASCII data is another critical consideration. Always specify the correct encoding (e.g., 'utf-8') when opening the file to avoid UnicodeDecodeError on read or UnicodeEncodeError on write.
The csv module only returns strings. Any necessary type conversion (e.g., to integers or floats) must be performed manually after reading the data. Conversely, all data passed to a writer must be converted to strings beforehand.
When dealing with messy or non-standard CSV files, invest time in configuring the dialect parameters like delimiter (e.g., for TSV files), quotechar, and escapechar. The csv.Sniffer class can often help deduce the dialect of an existing file. For writing, explicitly setting quoting=csv.QUOTE_ALL or quoting=csv.QUOTE_NONNUMERIC can provide greater control and consistency over how fields are quoted in the output.