82.1 hashlib: MD5, SHA-1, SHA-256, and SHA-3
Alright, let’s talk about hashing. You’ve probably heard the term thrown around—“we hashed the passwords”—and it sounds vaguely technical and secure. But what does it actually mean? In simple terms, a hash function is a one-way street. You feed it any amount of data—a password, the complete works of Shakespeare, a picture of your cat—and it spits out a fixed-size string of gibberish, called a digest or just a hash.
The key properties of a cryptographic hash function are:
- Deterministic: The same input will always, always, produce the same hash. If it didn’t, the internet would fall apart.
- One-way: You cannot take the hash output and reverse it to get the original data. This is the “one-way” part. You can’t un-scramble these eggs.
- Avalanche Effect: A tiny change in the input (even one bit) produces a drastically different, unrecognizable hash.
hellovs.hellpshould look completely unrelated. - Collision Resistant: It should be computationally infeasible to find two different inputs that produce the same hash output.
Why do you care? Hashes are the workhorses of security. They’re used for password storage, verifying file integrity (did this download get corrupted or tampered with?), and digital signatures. Python gives us a great toolbox for this in the hashlib module. No need to install anything; it’s baked right in.
The Usual Suspects: MD5, SHA-1, and Why You Shouldn’t Use Them
Let’s get this out of the way immediately: Do not use MD5 or SHA-1 for anything security-related. I’m serious. It’s not a suggestion; it’s a rule. They’re the cryptographic equivalent of using a screen door on a submarine.
Why? Because clever cryptographers have broken them. They’re vulnerable to collision attacks, meaning it’s now feasible for an attacker to craft two different pieces of data that produce the same hash. MD5 was broken for this over 15 years ago, and SHA-1 followed suit. If you use them to verify a file, an attacker could provide a malicious file that has the same hash as the legitimate one, and your verification would pass. Yikes.
So why are we even talking about them? Because you’ll see them everywhere in the wild (checksums for old files, etc.), and you need to know what they are so you can know to avoid them like the plague. Here’s what they look like, for academic purposes only.
import hashlib
data = "my_super_secret_password".encode('utf-8')
# DON'T USE THESE. I'M WATCHING YOU.
md5_hash = hashlib.md5(data).hexdigest()
sha1_hash = hashlib.sha1(data).hexdigest()
print(f"MD5: {md5_hash}")
print(f"SHA-1: {sha1_hash}")
The Modern Workhorse: SHA-256
When in doubt, you probably want SHA-256. It’s part of the SHA-2 family and is currently considered secure and robust. It’s what powers a lot of the modern web, including TLS certificates and blockchain stuff (don’t get me started on that). It’s your default choice for pretty much everything: file integrity checks, deriving keys, you name it.
Using it in hashlib is straightforward. Note the .encode('utf-8') call. Hash functions work on bytes, not strings. Forgetting this is the most common beginner mistake. If you get a Unicode-objects must be encoded before hashing error, you’ve been initiated.
import hashlib
# Always encode your string to bytes
data = "my_super_secret_password".encode('utf-8')
# Create a sha256 hash object
hash_object = hashlib.sha256(data)
# Get the hexadecimal digest of the hash
hex_dig = hash_object.hexdigest()
print(f"SHA-256: {hex_dig}") # Outputs a 64-character hex string
What if you have a lot of data? You don’t have to load the entire file into memory (a bad idea for a large file). You can update the hash object in chunks.
import hashlib
def get_file_hash(filename):
hash_object = hashlib.sha256()
with open(filename, 'rb') as f: # Note 'rb' for read-bytes mode
# Read the file in chunks of 4096 bytes
for chunk in iter(lambda: f.read(4096), b""):
hash_object.update(chunk)
return hash_object.hexdigest()
file_hash = get_file_hash('large_file.iso')
print(f"The SHA-256 hash of your file is: {file_hash}")
The New Kid: SHA-3 (Keccak)
SHA-3 is the latest member of the Secure Hash Algorithm family, and it won a public competition to get there. It’s not a derivative of SHA-2; it’s built on a completely different structure called Keccak. This is good! Diversity in our cryptographic algorithms is healthy.
Do you need to use SHA-3 over SHA-256 right now? Probably not. SHA-256 is still rock solid. But SHA-3 is another excellent, future-proof option. It’s available if you want the absolute latest and greatest or have a specific requirement for its different internal design.
Using it is identical to the others.
import hashlib
data = "my_super_secret_password".encode('utf-8')
sha3_hash = hashlib.sha3_256(data).hexdigest()
print(f"SHA3-256: {sha3_hash}")
Beyond the Hash: Salting Your Passwords
Here’s the most critical insight I can give you: Never, ever hash passwords alone.
If you just hash a password like 'password123' and store it, you’re vulnerable to rainbow table attacks. An attacker can precompute the hashes of millions of common passwords and just do a reverse lookup. If they see e150a1ec81e8e93e1eae2c3a77ef66f8d86c0f4f in your database, they can instantly know the password was password1.
The solution is a salt. A salt is a random value unique to each user that you mix with the password before hashing. You store the salt right next to the hash in your database. This completely obliterates rainbow tables because an attacker would have to generate a new table for every single unique salt.
import hashlib
import os
def hash_password(password):
# Generate a cryptographically secure random salt (16 bytes)
salt = os.urandom(16)
# Combine the salt and password, then hash them
salted_password = salt + password.encode('utf-8')
hash_obj = hashlib.sha256(salted_password)
# Return the salt and the hash, both stored in the database for this user
return salt, hash_obj.hexdigest()
def verify_password(stored_salt, stored_hash, password_attempt):
# Use the stored salt to hash the attempt
salted_attempt = stored_salt + password_attempt.encode('utf-8')
hash_obj = hashlib.sha256(salted_attempt)
attempted_hash = hash_obj.hexdigest()
# Compare the hashes in a constant-time way to avoid timing attacks
return attempted_hash == stored_hash
# Example usage
salt, pw_hash = hash_password('my_password')
print(f"Salt: {salt.hex()}")
print(f"Hash: {pw_hash}")
# Later, when verifying...
is_correct = verify_password(salt, pw_hash, 'wrong_guess')
print(f"Password verified: {is_correct}") # False
is_correct = verify_password(salt, pw_hash, 'my_password')
print(f"Password verified: {is_correct}") # True
For real-world applications, you should use a dedicated password hashing function like bcrypt, scrypt, or Argon2. They are intentionally slow (called “work factors”) to thwart brute-force attacks. But understanding that they are built on these same principles—hashing plus a salt plus iteration—is what matters. hashlib is your foundation; now you know how to build on it properly.