71.6 Python's Memory Allocator: pymalloc

Alright, let’s pull back the curtain on one of the most brilliant and underappreciated pieces of CPython: pymalloc. You’re about to see why a language that prizes developer happiness spends so much time optimizing for the tiny, boring task of asking for memory.

Think of your program’s memory usage not as a monolithic slab, but as a constant, frantic request for small, short-lived bits of stuff. x = 42, my_list.append(...), that id() call you used once—they all need a little bit of memory, and they need it now. If CPython went to the operating system’s general-purpose allocator (malloc in C) for every single one of these tiny requests, it would be like buying an entire industrial warehouse just to store a single bicycle. The overhead would be absurd. The OS allocator is powerful, but it’s also a generalist; it has to handle requests from a few bytes to gigabytes. For our little Python objects, that’s overkill.

So, the CPython developers did something sensible: they built a special-purpose allocator, pymalloc, designed specifically for the memory patterns of a Python runtime. Its entire reason for existence is to be blisteringly fast for small allocations and to reduce memory fragmentation.

How pymalloc Works: Arenas, Pools, and Blocks

pymalloc is a classic example of a hierarchical memory allocator. It carves up big chunks of memory it gets from the OS (called “arenas”) into smaller, uniform-sized “pools.” Each pool is dedicated to handing out blocks of one specific size. Let’s say you create a tuple. CPython needs 48 bytes for it. pymalloc will look for a pool that’s dedicated to 48-byte blocks. If one exists with a free block, boom, you get it instantly. This is incredibly fast because it’s just a few pointer operations—no complicated system calls.

This design is a work of art. It’s like a well-organized workshop where every tool is in a specific-sized slot. Finding a free block of the right size is a constant-time operation. The pools themselves are also organized into “size classes.” You won’t find a pool for every possible byte size—that would be chaos. Instead, there are size classes that round up allocation requests. Ask for 49 bytes? You’ll probably get a 64-byte block. This trade-off is more than worth it for the speed and reduced bookkeeping.

Here’s a quick way to see the size class of an object in Python:

import sys

# Let's create some small objects
small_list = [1, 2, 3]
small_tuple = (1, 2, 3)
an_int = 42

# Check their actual allocated size (more than what they hold)
print(f"List size: {sys.getsizeof(small_list)} bytes")
print(f"Tuple size: {sys.getsizeof(small_tuple)} bytes")
print(f"Int size: {sys.getsizeof(an_int)} bytes")

Run that. You’ll see the sizes are neat, rounded numbers. That’s pymalloc’s handiwork.

The Limits of pymalloc’s Kingdom

Now, here’s the critical part everyone gets wrong: pymalloc is not a total replacement for malloc. It has a strict domain. By default, it only handles allocations up to 512 bytes. Anything larger than that is politely handed off to the system’s malloc. This is a sane design choice. Large allocations are less frequent and have different patterns; the system allocator is better suited for them.

This leads to a crucial distinction in the CPython world. Memory can come from one of two “domains”:

The pymalloc domain (small, fast, managed by CPython).
The malloc domain (large, goes straight to the OS).

This is why you can’t just use a simple C function like free() on any Python object pointer. You must know which domain it came from. The Python C API has functions like PyMem_RawFree() for the OS domain and PyObject_Free() for the pymalloc domain. Mix them up, and you’ll spectacularly corrupt your memory heap. It’s the equivalent of putting diesel fuel in a gasoline engine.

When pymalloc Bites Back

For all its genius, pymalloc introduces a few quirks. The most famous one is that it can hold onto memory.

A pymalloc pool, once created, is not immediately returned to the OS when all its blocks are freed. Instead, it’s kept on a freelist, ready for the next Python object that needs a block of that exact size. This is fantastic for performance—it avoids constantly begging the OS for memory—but it’s terrible if you’re a developer looking at your process’s memory usage from the outside.

Your OS tools (top, htop, taskmgr) will show that your Python process is still using a lot of RAM, even after you’ve freed all your big objects. This is because pymalloc is sitting on a pile of empty pools, like a hoarder who insists they’ll use all those empty jars someday. This isn’t a memory leak in the traditional sense; it’s a high-water mark. The memory is available for the Python process to reuse instantly. It’s just not available for other processes on your machine.

You can control this behavior to some degree. The gc module can sometimes help persuade the allocator to release some of this memory back to the OS, but it’s not guaranteed.

import gc
# After a large memory operation and subsequent cleanup...
gc.collect()  # This *might* convince pymalloc to release some pools.

The best practice? Don’t panic. Understand that the RSS (Resident Set Size) reported by your OS is often a poor indicator of the actual, actively-used memory inside your Python process. Trust your program’s logic and use Python-specific tracemalloc tools if you need a true picture of object allocation. pymalloc is optimizing for speed, not for making your system monitor look pretty. It’s a trade-off, and on balance, it’s one of the main reasons your Python code runs as fast as it does.