69.4 Cython: Annotating Python for C-Speed Compilation
Right, so you want to go fast. You’ve got a Python function that’s become the bottleneck, grinding your elegant script to a halt in a loop of a million iterations. Rewriting it all in C sounds like a nightmare of buffer overflows and segfaults. Enter Cython. It’s not magic, but it’s the closest thing we have to a “go faster” button for Python. The deal is simple: you take your perfectly good Python code and give the Cython compiler a few hints—type annotations, mostly—about what things are. In return, it transpiles your Python into fiercely optimized C code, which then gets compiled into a native binary extension module you can import directly. It’s Python wearing a C-shaped performance skin suit.
The key thing to understand is that Cython’s power comes from reducing ambiguity. The Python interpreter is brilliant, but it has to check the type of a variable every single time it uses it. a + b could be integers, floats, strings, or custom objects with an __add__ method. That dynamism is what makes Python Python, but it’s murder on your CPU’s cache. By telling Cython cdef int a, int b, you cut the interpreter out of the loop entirely. The generated C code performs a single, direct integer addition. No checks, no lookups, just raw math. This is the fundamental trade-off: you surrender a little dynamic flexibility for a lot of static speed.
The .pyx File and the setup.py
You don’t write Cython in .py files. You write it in .pyx files. This is the source Cython compiles. To build it, you need a setup.py file that uses setuptools to invoke the Cython compiler. Let’s look at the canonical “Hello, World” of performance: a naive Fibonacci function.
First, the pure Python version in fib.py:
def fib(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
Now, let’s Cythonize it. We create cython_fib.pyx:
def fib(int n):
cdef int i
cdef int a = 0
cdef int b = 1
for i in range(n):
a, b = b, a + b
return a
Notice the changes? We’ve annotated the function argument n and the loop variable i as integers using cdef int. We’ve also defined a and b as integers inside the function. This tells Cython that these variables are C integers, not Python integers, for the entire scope of the function.
Now, the setup.py to build it:
from setuptools import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize("cython_fib.pyx")
)
You build it by running python setup.py build_ext --inplace. This compiles the .pyx into a .c file, then compiles the .c file into a .so (or .pyd on Windows) file right in your working directory. You can then import cython_fib and use cython_fib.fib() just like any other Python module, but it will run blindingly faster.
cdef, cpdef, and When to Use Which
The cdef keyword is your main tool. It defines a variable or function at the C level. It’s fast, but it’s invisible from the Python runtime. This is a crucial distinction.
cdeffor variables: As shown above, use this for local variables inside a function that are part of your tight loop. This is the #1 easiest win.cdeffor functions: Acdeffunction is a pure C function. It cannot be called from Python. You use this for internal helper functions that are called from other Cython functions. It has zero Python call overhead.cdef double _internal_calc(double x, double y): return x * x + y * y def public_func(a, b): return _internal_calc(a, b)cpdeffor functions: This is the best of both worlds, but with a small cost. It creates both a C function (for fast Cython calls) and a Python wrapper (so you can call it from Python). Use this for functions you need to call both from Python and from within your Cythonized code.The rule of thumb: usecpdef double calculate(double x, double y): return x * x + y * ydeffor the API,cdeffor internal speed, andcpdefwhen you genuinely need both.
Working with C Libraries
This is where Cython stops being neat and starts being utterly brilliant. You can talk directly to C libraries. Let’s say you want to use the cos function from math.h. You can just declare it and use it.
# cython_math.pyx
cdef extern from "math.h":
double cos(double theta)
def fast_cos(double x):
return cos(x)
That cdef extern block tells Cython about the C function’s signature. When it compiles, it links against the standard math library. The generated code calls the C cos function directly—no Python function call, no importing the math module, nothing. This pattern works for any C library. You can wrap entire APIs this way.
The Pitfalls: The Devil in the Details
- The Wrong Types: The biggest mistake is using
cdefon a variable but then assigning a Python object to it.cdef int x; x = "hello"will not end well. The compiler might catch it, or it might cause a cryptic runtime crash. Your type annotations are a promise. Keep it. - The Global Interpreter Lock (GIL): Cython code still runs with the GIL by default, meaning it’s not parallel. If you have a function that does a long, CPU-bound calculation with no Python objects, you can release the GIL to let other threads run using a
with nogil:block. This is advanced, but incredibly powerful.def calculate_nogil(double x): with nogil: # ... do pure C operations on x ... result = _pure_c_function(x) return result # GIL is re-acquired upon exiting the block - Over-optimizing: Don’t just
cdefeverything in sight. Profile your code first. The 90/10 rule applies: 90% of the time is spent in 10% of the code. Annotate that 10%. Addingcdefto a variable used once outside a loop is a pointless complication that makes your code less readable. Cython provides a great annotation tool (cython -a myfile.pyx) that generates an HTML file showing which lines interact heavily with Python (in yellow) and which are pure C (in white). Your goal is to turn the hot loops white.
Cython respects Python too much to be a mere preprocessor. It lets you start with pure Python and gradually add type information until you hit the performance target you need. It’s a pragmatic, incremental path from a beautiful, slow script to a beautiful, fast shared library. Use it.