90.1 Python 2 vs Python 3: The Breaking Changes

Alright, let’s get into the weeds. The Python 2 vs. Python 3 schism wasn’t just an update; it was a fundamental, “break-everything-for-the-greater-good” level rewrite. The core dev team, led by the BDFL (Guido van Rossum), decided that certain warts in the language were too ugly to live with anymore. They chose a hard break to clean things up, knowing full well it would cause years of migraines. And oh boy, did it. But they were right. Python 3 is a cleaner, more consistent language. Our job now is to navigate the minefield they left behind so you don’t have to.

The Big One: Print is a Function, Not a Statement

This is the change everyone stubs their toe on first. In Python 2, print was a special magical statement, like if or for. In Python 3, it’s a built-in function. This sounds pedantic until you try to use the old way.

# Python 2 (RIP)
print "Hello, world!"
print item1, item2  # Trailing comma suppresses newline
print >>sys.stderr, "Error!"

# Python 3 (The Way)
print("Hello, world!")
print(item1, item2, end=" ")  # Use `end` parameter
print("Error!", file=sys.stderr)  # Use `file` parameter

Why? Because functions are first-class citizens in Python. Making print a function means you can pass it around, use it in lambdas, override it more cleanly, and its behavior is controlled with explicit keyword arguments (file, end, sep) instead of arcane punctuation. It’s a win for consistency, even if your muscle memory disagrees.

The String Type Schism: Text vs. Data

This is the most important conceptual change and the source of most migration pain. Python 2’s str was a bytes-like object. Its unicode type was for, well, text. This was a mess because you could often get away with mixing them until suddenly, spectacularly, you couldn’t.

Python 3 draws a line in the sand: str is for text, bytes is for data. Always. Forever. No implicit conversion.

# Python 2 - The Bad Old Days
s = 'café'          # This is a byte string. The 'é' is a landmine.
u = u'café'         # This is a unicode string.
print len(s)        # 5? 4? Depends on encoding. It's a nightmare.
print len(u)        # 4

# Python 3 - Clarity
s = 'café'          # This is a text string (unicode). Always.
b = b'café'         # This is a bytes object. See the 'b' prefix?
print(len(s))       # 4. Always. Because it's 4 characters.
# print(len(b))     # This would be 5. But you'll likely get a SyntaxError
                    # because 'é' isn't ASCII. You have to declare bytes literals with ASCII.

# The correct way to make a bytes object with non-ASCII data is to encode text.
b = s.encode('utf-8')  # b'caf\xc3\xa9' (5 bytes)

The Pitfall: You will see TypeError: a bytes-like object is required, not 'str' and its evil twin TypeError: must be str, not bytes more times than you can count. The fix is always explicit encoding/decoding at the boundary of your program (file I/O, network calls). Know your encode() (text -> bytes) and decode() (bytes -> text).

Integer Division

A classic “wut” moment for newcomers in Python 2. The designers originally chose that dividing integers should return an integer, truncating the result. This is mathematically sane but practically infuriating.

# Python 2
print 5 / 2   # 2. Nope. Not a float. Just 2.

# Python 3
print(5 / 2)  # 2.5. Finally, the result you actually expect.
print(5 // 2) # 2. The new explicit floor division operator.

Why? Because expecting 5/2 to be 2.5 is intuitive. Forcing everyone to remember to use floats (5.0 / 2) was a usability bug. The // operator was introduced for those rare cases where you explicitly want floor division.

Iteration: The Great Generification

Python 3 took a “if it can be an iterator, it should be an iterator” approach. This is more memory efficient.

dict.keys(), .values(), .items() now return view objects (which are dynamic iterables) instead of lists. You can’t just blindly index into them anymore.
map(), filter(), zip() return iterators, not lists. This is great for memory but means if you just want to print the result, you’ll see a cryptic <map object at 0x...>. You need to consciously cast to a list: list(map(...)).

d = {'a': 1, 'b': 2}
# Python 2
key_list = d.keys()   # ['a', 'b'] (a list)
# Python 3
key_iterator = d.keys()   # dict_keys(['a', 'b']) (an iterable view)
# print(key_iterator[0])   # TypeError!
# Do this instead:
list_of_keys = list(d.keys())

Best Practice: Stop assuming these methods return lists. Start writing loops that expect iterables. It’s a better habit and saves memory.

The `future` Import: Your Time Machine

You can write Python 3 code in Python 2.7! Well, some of it. The __future__ module lets you backport features.

# Put this at the VERY TOP of your Python 2.7 file to ease migration.
from __future__ import print_function
from __future__ import division
from __future__ import unicode_literals  # Makes string literals unicode by default

# Now you can (mostly) write Py3 code.
print("Hello!", file=sys.stderr)
result = 5 / 2  # 2.5
s = 'hello'     # This is a unicode string

Use this. It’s the single best way to start porting a large codebase. It fixes the low-hanging fruit and gets you thinking in the new paradigm. The migration wasn’t a walk in the park, but the destination is unquestionably better. The language’s consistency is worth the trip. Now, let’s talk about how you actually make that trip.

The Big One: Print is a Function, Not a Statement

The String Type Schism: Text vs. Data

Integer Division

Iteration: The Great Generification

The __future__ Import: Your Time Machine

The `future` Import: Your Time Machine