71.7 Reference Counting in the C API
Alright, let’s pull back the curtain on one of CPython’s most fundamental and yet most notorious mechanisms: reference counting. Forget the GIL for a moment; this is the real bedrock of memory management in your Python runtime. It’s a brutally simple concept: every object in the C API has a counter (ob_refcnt) that tracks how many places are pointing to it. When you create a reference, you increment it. When you’re done with a reference, you decrement it. If that count hits zero, the object’s memory is reclaimed immediately. No garbage collection pauses, no fuss. It’s deterministic, and in a language like C, that’s a godsend.
But here’s the catch: it’s your job. The C API gives you the power, which means it also gives you the responsibility to not blow your own foot off. Forget to decrement? You’ve got a memory leak. Decrement too much? You get a segmentation fault as you try to access freed memory. It’s a tightrope walk, but once you get the rhythm, it becomes second nature.
The Core Macros: Your New Best Friends
You’ll live and die by these four macros. They are the incantations you must use to manipulate an object’s life.
#include <Python.h>
PyObject *my_object = PyLong_FromLong(42); // object refcount is now 1
Py_INCREF(my_object); // refcount becomes 2
Py_DECREF(my_object); // refcount becomes 1
// Object is still alive.
Py_DECREF(my_object); // refcount becomes 0. Object is destroyed. Poof.
// 'my_object' is now a dangling pointer. Don't touch it.
The most important rule: You own a reference if you’ve called Py_INCREF on it or if a function returns a new reference. You borrow a reference if you’re just looking at an object someone else owns. Your job is to Py_DECREF the references you own when you’re done with them. The most common mistake is thinking a borrowed reference is yours to keep; it’s not. If you need to hang onto a borrowed reference, you must Py_INCREF it to promote it to an owned reference.
The Function Contract: New vs. Stolen References
This is where most newcomers get tripped up. CPython functions have a contract regarding references, and you must read the API docs to know what you’re getting.
- A function that returns a new reference (like
PyLong_FromLong) creates a new object and gives you ownership of the one reference. You are responsible for it. - A function that steals a reference (like
PyList_SetItem) takes ownership of your reference to an object. You are no longer responsible for it. The list will now manage the object’s lifetime. This is a performance optimization to avoid unnecessary INCREF/DECREF calls.
PyObject *list = PyList_New(3); // New ref for the list
PyObject *num = PyLong_FromLong(1); // New ref for the number
// PyList_SetItem STEALS the reference to 'num'.
// We no longer have to DECREF it. The list owns it now.
PyList_SetItem(list, 0, num);
// This would be a catastrophic mistake. The reference was stolen!
// Py_DECREF(num); // DO NOT DO THIS.
// Correctly clean up the list, which will also clean up the number it contains.
Py_DECREF(list);
Conversely, a function like PyList_GetItem returns a borrowed reference. You’re just peeking at the object in the list. If you need that object to stick around after the list is destroyed, you must INCREF it.
The Pit of Despair: Common Bugs
- The Double DECREF: You decrement a reference more times than you incremented it. This will crash your interpreter. The best way to avoid this is to set pointers to
NULLafter youPy_DECREFthem. APy_DECREF(NULL)is a safe no-op. - The Forgotten INCREF: You hold onto a borrowed reference (e.g., from a tuple element) and then the container gets DECREF’d and destroyed, taking your borrowed object with it. Now you’re pointing to freed memory.
- The Leak: You create a new reference (or INCREF a borrowed one) and then simply forget to ever DECREF it. Your program’s memory usage slowly balloons. These are a pain to track down.
The Golden Rule and Best Practices
The single best piece of advice I can give you is to be ferociously consistent in your ownership logic.
- Comment your code. For every
Py_INCREF, note why you’re taking ownership. For everyPy_DECREF, note what reference you’re finally releasing. - Use
Py_XDECREF. This is the safe version ofPy_DECREFthat checks for NULL first. It’s invaluable for cleanup code in error paths. - Understand the abstraction you’re working with. When writing an extension type or a function, know exactly what references you own at every point in your code. Draw little diagrams if you have to. It seems pedantic until it saves you from a three-day debugging session.
Reference counting isn’t glamorous, but it’s the engine that makes embedding and extending Python possible. It’s a system built on raw, unforgiving C, and it demands respect. Give it that respect, and it will be the most reliable part of your code.