67.2 cProfile and pstats: Function-Level Profiling

Right, so you’ve written some code. It works. You’re feeling pretty good about yourself. But is it fast? Or does it secretly run like a dog walking on its hind legs—technically impressive that it works at all, but you can’t help but wince while watching it? Guessing which part is slow is a fantastic way to be wrong and waste an afternoon. We don’t guess. We measure. And for that, we bring in the heavy artillery: cProfile.

Think of cProfile as a high-precision scientific instrument, not a blurry Instagram filter. It doesn’t sample; it records every single function call, every return, every exception. It’s brutally honest, and sometimes that honesty hurts. It tells you exactly where your program is spending its time, down to the specific function. The overhead is non-trivial (around 10%), so you wouldn’t run it in production, but for local optimization work, it’s indispensable.

Let’s start with the simplest way to use it: running it as a module against an existing script.

python -m cProfile -o my_script.prof my_script.py arg1 arg2

This runs my_script.py and dumps all the profiling data into my_script.prof. The .prof file is binary, so to make sense of it, you need another tool. That’s where pstats comes in.

The pstats Interactive Stat Browser

Reading the raw .prof file is like trying to read a database file by opening it in Notepad. Pointless. You use pstats to create a stats object and then interrogate it. Fire up a Python shell.

import pstats
p = pstats.Stats('my_script.prof')

Now you have this p object, which is your gateway to all the data. The most common command is sort_stats followed by print_stats.

p.sort_stats(pstats.SortKey.TIME)
p.print_stats(10)  # Show the top 10 offenders

This sorts the stats by the internal time spent in each function (excluding time spent in calls to sub-functions) and prints the top ten. The output looks like a dense phone book, but it’s pure gold.

         1000004 function calls in 2.145 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    1.234    0.000    1.234    0.000 {method 'append' of 'list' objects}
        1    0.911    0.911    2.145    2.145 my_script.py:15(expensive_loop)
   200000    0.000    0.000    0.000    0.000 {built-in method builtins.len}

Here’s how to read this:

ncalls: Number of calls to this function. If there are two numbers (e.g., 100/10), it means the function recursed, and the first number is total calls, the second is primitive (non-recursive) calls.
tottime: Total time spent inside this function, excluding time in sub-functions. This is “internal time.”
percall: tottime divided by ncalls.
cumtime: Cumulative time spent in this function and all the functions it called. This is the truly telling metric.
percall: cumtime divided by ncalls.

In the example above, the list.append method is the biggest culprit by internal time. But look at the cumtime for expensive_loop: it’s the entire runtime. This tells you that the problem isn’t necessarily that expensive_loop itself is slow, but that it’s calling a ton of slow operations (like append) a million times.

Decorating Your Way to Profiling

Running your whole script via the command line is great, but sometimes you want to profile just a specific chunk of code. For that, we use the cProfile.Profile() context manager.

import cProfile
import pstats
import io

def some_function_i_suspect():
    # ... some potentially slow code ...

# Create a profiler and run the function
profiler = cProfile.Profile()
profiler.enable()
some_function_i_suspect()
profiler.disable()

# Create a stream for the output and print it
s = io.StringIO()
sortby = pstats.SortKey.CUMULATIVE
stats = pstats.Stats(profiler, stream=s).sort_stats(sortby)
stats.print_stats()
print(s.getvalue())

This is the way to go when you’ve isolated a problem area and want to avoid the noise of the rest of your application’s startup time.

Common Pitfalls and How to Avoid Them

Optimizing the Wrong Thing: The biggest mistake is looking at tottime and immediately trying to micro-optimize the function at the top. Always check the cumtime first. A function with a high tottime but low cumtime is already efficient; it’s just being called a lot. The real problem is likely the function that’s calling it a million times. Optimize algorithms, not instructions.
Ignoring Built-in Functions: See {method 'append' of 'list' objects} at the top? You can’t optimize that. It’s a C built-in. Its presence at the top is a gigantic, flashing neon sign telling you that your algorithm’s complexity is the problem. You’re using a data structure wrong or you’ve written an O(n²) algorithm when an O(n log n) exists.
Not Cleaning Up Your Code First: Don’t profile code that’s full of debug print statements or unnecessary disk writes. Profile the clean, logical version of your algorithm. Otherwise, you’re just optimizing your own noise.
Forgetting About Overhead: Remember, cProfile adds overhead, and it adds it disproportionately to functions that are called frequently. A function called 100,000 times will seem slower than it is relative to a function called 10 times. The order of what’s slow is reliable; the exact timings should be taken with a grain of salt.

The cProfile and pstats combo is one of the most powerful tools in your Python performance toolkit. It replaces superstition with data. It’s the difference between saying “I think the database call is slow” and knowing, with absolute certainty, that 73% of your runtime is spent in one poorly written function that’s concatenating strings in a loop. Now go find that loop and fix it. Your brilliant friend has spoken.