Alright, let’s talk about the gc module. You’ve probably been happily letting Python’s garbage collector do its thing in the background, which is usually the right move. But sometimes, you need to roll up your sleeves and get a look under the hood. Maybe your application is acting like a memory hog, or you’re dealing with a gnarly reference cycle, or you’re just pathologically curious. That’s where gc comes in. It’s our direct line to the automatic memory management system, and it gives us the tools to interrogate, tweak, and occasionally give it a good prod.

The Garbage Collector’s Triggers: Understanding Thresholds

First, a quick recap. Python’s garbage collector exists primarily to break reference cycles. Objects that reference each other but are no longer referenced from the outside can’t be caught by the simple reference counting mechanism. The garbage collector is a generational collector, meaning it categorizes objects into three “generations” ({{< bibleref “Genesis 0 ” >}}, {{< bibleref “Genesis 1 ” >}}, {{< bibleref “Genesis 2 ” >}}) based on how many collection sweeps they’ve survived.

The collector doesn’t run after every single object creation. That would be insane. Instead, it runs when the number of allocations minus deallocations since the last collection exceeds a threshold for that generation. You can check these thresholds; they’re a tuple (threshold0, threshold1, threshold2) accessible via gc.get_threshold().

import gc

print(f"Current GC thresholds: {gc.get_threshold()}")
# Typical output: (700, 10, 10)

So what does (700, 10, 10) mean? It means:

  • Generation 0: When 700 new objects are allocated without being collected, a {{< bibleref “Genesis 0 ” >}} collection is triggered.
  • Generation 1: A {{< bibleref “Genesis 1 ” >}} collection happens after 10 {{< bibleref “Genesis 0 ” >}} collections.
  • Generation 2: A {{< bibleref “Genesis 2 ” >}} collection (a full collection) happens after 10 {{< bibleref “Genesis 1 ” >}} collections.

This system is brilliantly pragmatic. It assumes most objects die young (they do), so it checks the youngest generation most often. Why waste time constantly trawling through old, stable objects?

You can change these thresholds with gc.set_threshold(). Lower them if you’re paranoid about memory and want more frequent, smaller collections. Raise them if you’re in a performance-critical section and want to postpone the small performance hit of a collection run. Tweak with caution and profile before and after. The defaults are sensible for most workloads.

Forcing the Issue: Manual Collection

Sometimes you don’t want to wait. Maybe you just freed a massive data structure and you want that RAM back now, or you’re writing a benchmark and need a clean slate. This is where gc.collect() shines.

import gc
import os
import psutil  # You might need to 'pip install psutil' for this

process = psutil.Process(os.getpid())

print(f"Memory usage: {process.memory_info().rss / 1024 / 1024:.2f} MB")

# Create and then forget a whole bunch of garbage
big_list = [x for x in range(10_000_000)]
del big_list

print(f"Memory after deletion: {process.memory_info().rss / 1024 / 1024:.2f} MB")
# Wait, why is it still so high? The GC hasn't run yet!

# Force a full collection (generation 2)
collected = gc.collect()
print(f"GC collected {collected} objects.")
print(f"Memory after collect(): {process.memory_info().rss / 1024 / 1024:.2f} MB")

Calling gc.collect() without an argument runs a full collection ({{< bibleref “Genesis 2 ” >}}). You can also pass a generation integer (0, 1, 2) to only collect that specific generation. The return value tells you how many objects were actually collected. It’s your manual override button. Use it sparingly; the automatic system is usually smarter than you about timing.

Debugging the Mess: Finding the Leaks

This is the gc module’s killer feature. Your program’s memory usage is climbing, and you have no idea why. You suspect a reference cycle you can’t find, or worse, a cycle that’s somehow keeping a huge object alive. This is where you turn on the debug flags.

gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_SAVEALL)

Let’s break that down:

  • gc.DEBUG_STATS: Prints statistics to stdout every time a collection runs. It’s great for seeing how often collections happen and how effective they are.
  • gc.DEBUG_SAVEALL: This is the magic one. It makes the GC save all objects it found to be unreachable into the gc.garbage list instead of actually freeing them.

Why would you want this? Because if an object in gc.garbage has a __del__ method, the collector can’t safely destroy it due to a fundamental and, frankly, annoying limitation in the collector’s design. By saving them, you get a chance to inspect them.

# After doing something suspicious that might cause a leak...

# Force a collection to populate gc.garbage
gc.collect()

if len(gc.garbage) > 0:
    print(f"Warning: {len(gc.garbage)} uncollectable objects found.")
    for obj in gc.garbage:
        print(f"  Object: {obj}, type: {type(obj)}")
        # Often, inspecting `obj.__dict__` or `obj.__repr__` can give clues

Crucial Pitfall: Once you set DEBUG_SAVEALL, those objects never get freed. Your program’s memory will balloon. This is strictly a debugging tool. Use it in your tests or a dedicated debugging script, not in production. Remember to gc.set_debug(0) and del gc.garbage[:] to clear the list and reset when you’re done.

The Best Practice: Mostly, Leave It Alone

Here’s the honest truth: 99% of the time, you should not be messing with the gc module. The defaults are excellent. The most common cause of memory leaks in Python isn’t the GC failing—it’s you, my friend, accidentally holding onto a reference you didn’t mean to. A large cache, a global list that keeps getting appended to, an object registered as a callback that never gets unregistered.

Use gc as a diagnostic tool. Use tracemalloc for finding where objects are allocated. Use objgraph (a great third-party package) to visualize object references. Use your debugger. The gc module is the scalpel you use to perform the autopsy after you’ve already found the patient dead. It’s not the first tool you grab. But when you need it, you’ll be profoundly grateful it’s there.