67.3 line_profiler: Line-By-Line Timing

Alright, let’s get our hands dirty. You’ve probably used cProfile or timeit and found yourself staring at a function name, thinking, “Great, I know my_awful_function() is the problem. Now, which of the 200 lines of spaghetti inside it is actually causing the pain?” This is the universal frustration of function-level profilers. They get you to the crime scene, but not the specific bullet casing.

Enter line_profiler. This beautiful tool is the detective that goes in and tells you exactly how many times each line was executed and, crucially, how long each one took on average. It’s the difference between knowing “the engine is broken” and knowing “spark plug #3 is misfiring.” We’re about to perform some open-heart surgery on your code, and this is our MRI machine.

Installing the Damn Thing

First, the obligatory installation. You can’t just pip install and go; this thing has a C extension. Don’t worry, it’s usually painless.

pip install line_profiler

The main thing you get is a console utility called kernprof, which is the script that actually does the profiling. You’ll also get the line_profiler module itself, which we use to view the results.

Profiling a Simple (and Flawed) Example

Let’s create a function so inefficient that it offends me to write it. It’s perfect for our purposes.

# poorly_optimized.py

def slow_function():
    total = 0
    for i in range(10000):  # Loop 1
        for j in range(10000): # Loop 2, the real villain
            total += i * j
    return total

def fast_function():
    return sum(range(10000)) * sum(range(10000)) / 2  # A vaguely mathy better way

if __name__ == "__main__":
    slow_result = slow_function()
    fast_result = fast_function()
    print(f"Slow got: {slow_result}, Fast got: {fast_result}")

Now, to profile slow_function, we need to decorate it. This is the one slightly annoying bit. You have to explicitly tell kernprof which functions to profile by tagging them with the @profile decorator. Important: Do NOT import profile from anywhere. The kernprof script injects this decorator into the global namespace magically when it runs your script. Just slap it on there.

# poorly_optimized.py (with decorator)

@profile  # <-- This is magically injected by kernprof
def slow_function():
    total = 0
    for i in range(10000):
        for j in range(10000):
            total += i * j
    return total

# ... rest of the code ...

Now, run the profiler:

kernprof -l -v poorly_optimized.py

The -l flag tells it to do line-by-line profiling, and -v immediately outputs the results to the console after the script runs (instead of saving to a .lprof file). Be prepared to wait a moment. That nested loop is doing 100,000,000 iterations. I did warn you.

Interpreting the Glorious, Horrifying Output

After an eon, you’ll get a table that looks something like this (trimmed for brevity):

Total time: 15.247 s
File: poorly_optimized.py
Function: slow_function at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def slow_function():
     5         1          1.0      1.0      0.0      total = 0
     6     10001      5,821.0      0.6      0.0      for i in range(10000):
     7  10001000  101,002,100.0     10.1     66.3          for j in range(10000):
     8  10000000  5,123,456,789.0    512.3     33.6              total += i * j
     9         1          1.0      1.0      0.0      return total

Let’s read this masterpiece of despair:

Hits: How many times each line was executed. Line 8 was hit 10 million times. Yikes.
Time: The total time spent on that line, in whatever obscure units the timer uses (often microseconds). This is the most important column. Don’t get hung up on the absolute number; focus on the relative difference between lines.
Per Hit: Time / Hits. The average cost of a single operation on that line.
% Time: The percentage of the function’s total runtime spent on that line. This is the other most important column. It tells you precisely where to aim your optimization efforts.

The diagnosis is immediately obvious: 99.9% of the time is spent inside the two loops, with the overwhelming majority (33.6%) on the innocent-looking total += i * j. The inner for j in range(10000) line is also a major culprit at 66.3%. This is the smoking gun. Our optimization must target these two lines. Replacing the nested loop with the fast_function approach would eliminate this bottleneck entirely.

Best Practices and “Gotchas”

The Decorator is Magic: Remember, you only add @profile for kernprof. If you try to run your script normally with python poorly_optimized.py, it’ll crash with a NameError: name 'profile' is not defined. I often use a hack to avoid this:

# Add this at the top of your file
try:
    # __file__ is defined when run by kernprof
    if __file__:
        profile  # Check if 'profile' is defined in the magic namespace
except NameError:
    # If we're running normally, create a no-op decorator
    def profile(func):
        return func

This lets you run the script normally or with the profiler without changing the code.

Profile in Production-Like Environments: Profiling on your laptop while you have 50 browser tabs open will give you noisy results. Profile on the same machine (or one with similar specs) that will run the code in production. The absolute times will be different, but the relative percentages (% Time) should be consistent.
It Adds Overhead. A Lot of It. line_profiler is intrusive. It dramatically slows down your code. That’s why we use it on small, specific sections. Don’t profile your entire web server with it; you’ll be waiting for years. Isolate the suspect function and profile that.
It’s Not for Threads or Async: line_profiler can get confused by concurrent code. It’s best used on straightforward, synchronous, CPU-bound functions. For async, you’re better off with other tools.

line_profiler isn’t the tool you leave running all the time. It’s the scalpel you pull out when you know something’s wrong and you need to find the exact artery that’s bleeding. It turns a vague sense of slowness into a precise, actionable report. Now go find that O(n²) algorithm hiding in your code and make it regret its life choices.