90.2 Migrating a Python 2 Codebase to Python 3
Alright, let’s get our hands dirty. Migrating from Python 2 to 3 isn’t just a version bump; it’s a language transplant. The core DNA is the same, but a lot of the organs work differently. The good news? The Python community has poured an immense amount of effort into making this as painless as possible. The bad news? If your codebase is large and ancient, “painless” is a relative term. We’re going to methodically break this beast down.
First, a moment of silence for Python 2.7. It had a good, long, frankly too long run. But its time is up. No security updates. No nothing. Running it now is like using a phone with a cracked screen that also occasionally catches fire. Let’s move on.
The First Step: Know Thy Enemy (Your Codebase)
Before you change a single character, you need to see what you’re dealing with. Blindly running 2to3 is a recipe for a nervous breakdown. Use caniusepython3 to check your dependencies. This is non-negotiable.
pip install caniusepython3
caniusepython3 -r requirements.txt
If a critical library is blocking you, you have your first problem to solve. Sometimes you’ll need to find a fork, sometimes you’ll have to beg the maintainers, and sometimes you’ll have to bite the bullet and replace it.
Next, run 2to3 on your codebase just to see what it suggests. Don’t apply the changes yet. Think of it as a very verbose, slightly pedantic code reviewer.
2to3 --output-dir=/tmp/converted_code --add-suffix='3' --write-unchanged-files --nobackups your_project_dir/
Now, wade through the /tmp/converted_code directory. This will show you the sheer scale of the changes, especially around the big-ticket items we’re about to cover.
The Big One: Strings and Bytes
This is the heart of the migration. Python 3 made the brilliant, correct, and infuriating decision to strictly separate text from binary data. In Python 2, str was bytes, and unicode was text. It was a mess and we all pretended it wasn’t. Python 3 forces you to be honest.
stris for text (unicode).bytesis for, well, bytes (like data from a network socket or a binary file).
The most common pain point? You can’t just open a file without specifying your intent.
# Python 2 (the old, reckless way)
f = open('data.bin', 'r')
data = f.read() # "data" is a string of bytes. Maybe. Who knows?
# Python 3 (the correct, adult way)
# You MUST decide: is this text or bytes?
f_text = open('data.txt', 'r') # 'r' mode returns text (str)
contents = f_text.read() # contents is a string
f_binary = open('data.bin', 'rb') # 'rb' mode returns bytes
data = f_binary.read() # data is a bytes object
Trying to write bytes to a text file (or vice-versa) will get you a loud TypeError. This is a good thing! It’s catching bugs before they happen. The rule of thumb: if you’re dealing with a file that isn’t a plain text file (e.g., images, PDFs, pickled objects), you almost certainly want 'rb' or 'wb'.
The Print Debacle (and Other Syntax Changes)
This one is simple but everywhere. print went from a statement to a function. This is mostly an easy fix.
# Python 2
print "Hello, world!"
print item1, item2
# Python 3
print("Hello, world!")
print(item1, item2)
The 2to3 tool will handle 99% of these. The other 1% is when you did something truly cursed like overriding stdout, but you already knew you were living on borrowed time.
Other syntax changes include the except clause and inequality operators:
# Python 2
try:
do_something()
except Exception, e: # comma
handle_error()
if 1 <> 2: # less common, but exists
pass
# Python 3
try:
do_something()
except Exception as e: # 'as' keyword
handle_error()
if 1 != 2: # use !=
pass
The Iterator Apocalypse
In Python 2, the world was eager. Functions like range, zip, map, and dict.keys() returned lists. This was convenient but incredibly wasteful for large data sets. Python 3, being more environmentally conscious, made them lazy. They return iterator objects.
# Python 2
my_range = range(1000000) # allocates a list of a million integers. Ouch.
my_zip = zip(list_a, list_b) # another full list
# Python 3
my_range = range(1000000) # returns a range object, tiny memory footprint
my_zip = zip(list_a, list_b) # returns a zip iterator
# If you *need* a list (e.g., to index into it), be explicit
my_list = list(range(10))
This is a huge win for performance, but it can break code that assumed these functions returned lists. The most common pitfall is trying to iterate over the result of dict.keys() multiple times. In Python 3, it’s a view, and you can only iterate over an iterator once.
d = {'a': 1, 'b': 2}
keys = d.keys() # dict_keys object (is an iterator)
# First iteration works
for k in keys:
print(k)
# Second iteration does nothing because the iterator is exhausted
for k in keys:
print("This won't print", k)
The fix is simple: if you need to use the result multiple times, just cast it to a list: keys = list(d.keys()).
Division and the Quest for Sanity
Another one of Python 2’s silent horrors was integer division. Thankfully, Python 3 fixed it.
# Python 2
print 5 / 2 # 2 (floor division, because both are integers)
print 5 // 2 # 2 (explicit floor division)
# Python 3
print(5 / 2) # 2.5 (true division, finally!)
print(5 // 2) # 2 (explicit floor division)
If your old code relies on integer truncation, you’ll need to hunt down every division and decide if it should be / or //. This is a subtle bug that can change numerical results without throwing an error.
The Modern Toolchain: Your Best Friends
Don’t be a hero. Use the tools built for this.
futurize: A more modern and nuanced tool than2to3. It can often help you write code that is compatible with both Python 2 and 3, which is invaluable for a gradual migration. Runfuturize --stage1and thenfuturize --stage2on your code to see the suggestions.python -3: Run your Python 2 interpreter with the-3flag. It will warn you about Python 3 incompatibilities that it can detect at runtime. It’s not comprehensive, but every warning it gives you is a future bug squashed.- Testing, Testing, Testing: You absolutely must have a robust test suite. The migration is a massive refactoring operation. Without tests, you are flying blind. The goal is to get your test suite passing on Python 3 while still producing the exact same results on Python 2. Once that’s true, you flip the switch.
The migration is a grind, but it’s a rewarding one. You’ll emerge on the other side with a cleaner, faster, and more maintainable codebase. And you’ll never have to see another u'string' prefix again. Trust me, it’s worth it.