Optimization | mikePietsch.com

37.8 Benchmarking Best Practices and Avoiding Compiler Tricks

Right, let’s get our hands dirty. Benchmarking in Go is deceptively simple, which is precisely why so many people get it subtly, tragically wrong. The testing package gives you just enough rope to hang yourself with, and the compiler—oh, the clever, clever compiler—is actively looking for a reason to snip your code into oblivion. Our job is to outsmart it, to force it to show us the real performance cost, not the cost of a cleverly optimized mirage.

37.7 String Interning and bytes.Buffer vs strings.Builder

Right, let’s talk about strings. You love them, I love them, the Go runtime tolerates them. They’re the duct tape of our programs, holding everything together until they suddenly become the number one reason your elegant service is now gasping for memory like a fish on a sidewalk. The fundamental problem is that strings in Go are immutable. This is a fantastic feature for concurrency and safety, but a real pain when you’re building them up in a hot loop. Every time you write s += "new piece", you’re not just appending; you’re allocating a whole new string, copying both s and "new piece" into it, and then sending the old s off to be cleaned up by the garbage collector (GC). Do this a few thousand times and your GC is going to be working overtime, putting a serious damper on your throughput.

37.6 Reducing Allocations: sync.Pool, Value Types, and Preallocating Slices

Right, let’s talk about allocations. In the world of Go, allocations are like trips to the garbage can: you have to do them, but if you’re running back and forth every five seconds, you’re not getting any real work done. The garbage collector is incredibly smart, but it’s not clairvoyant. Every time you escape to the heap, you’re giving it more work to do later, which means eventually, it will have to stop your world (or at least a big part of it) to clean up your mess.

37.5 Escape Analysis: go build -gcflags -m

Alright, let’s get our hands dirty with one of Go’s coolest party tricks: escape analysis. This isn’t some abstract academic concept; it’s the compiler’s way of making a crucial decision for you: “Should this variable live on the stack, nice and cheap, or does it need to escape to the heap, the land of garbage collection and slower allocations?” To see the compiler’s thought process laid bare, we use the -gcflags="-m" flag. Running go build -gcflags="-m" your_file.go will spit out a torrent of messages telling you exactly what escapes and, more importantly, why. Let’s decode this output together.

37.4 go tool trace: Goroutine and Scheduler Traces

Alright, let’s get our hands dirty with go tool trace. You’ve probably been staring at CPU and memory profiles until your eyes cross, wondering why your beautifully concurrent Go application isn’t going as fast as it should. Sometimes, the problem isn’t what your code is doing, but how and when the goroutines are being scheduled to do it. That’s where the execution tracer comes in. It’s like getting a top-down view of a busy highway system; a CPU profile just tells you which cars are revving their engines the hardest.

37.3 go tool pprof: Reading Profiles and Flame Graphs

Right, let’s get our hands dirty. You’ve just run your Go service under pprof, you’ve captured a profile, and now you’re staring at a terminal prompt or a scary-looking SVG. It feels like you’ve been handed the blueprints to a skyscraper written in a foreign language. Don’t panic. We’re going to learn that language together. The first thing to internalize is that pprof is not a single tool; it’s a Swiss Army knife with a dozen blades. The most common profiles you’ll grab are the CPU profile and the Heap (memory) profile. They answer two fundamentally different questions: “What is burning my CPU time?” and “Where is my memory getting allocated?”.

37.2 net/http/pprof: Live Profiling of Running Servers

Right, so you’ve got a service running. It’s chugging along, but something’s off. Maybe it’s a bit sluggish under load, or perhaps memory usage is doing a concerning impression of a ski jump. You need to see what’s happening right now, on its terms, in production. You don’t get to stop the world and attach a debugger. This is where net/http/pprof becomes your best friend—a Swiss Army knife that’s mostly sharp blades for introspection.

37.1 pprof: CPU and Memory Profiling

Right, let’s talk about pprof. This isn’t some abstract academic concept; it’s the scalpel you use when your application starts coughing up blood. You don’t just “think” your code is slow—you know it, with data. pprof is how you get that data. It’s the single most powerful tool in the Go profiler’s arsenal, and it’s built right into the standard library. The designers at Google, for all their quirks, absolutely nailed this one.

37. Performance Profiling and Optimization in Go

14. Image Optimization

25. Performance Profiling and Optimization

8. Image Layers and the Build Cache

67.9 Avoiding Global Lookups and Repeated Attribute Access

Alright, let’s talk about one of the most common, and frankly, easiest-to-fix performance drains I see in the wild: the double-whammy of global lookups and repeated attribute access. This isn’t about fancy algorithmic wizardry; this is about cleaning up the sloppy, lazy code we all write at 2 AM so it doesn’t embarrass us in the light of day. Think of your code’s namespace as a series of concentric circles. The innermost circle is your local scope. It’s the VIP lounge—getting in there is fast, cheap, and easy. The outermost circle is the global scope, which includes built-in names like len or str. That’s the parking lot. Every time you reference a name in that global scope, the interpreter has to leave the comfy VIP lounge, trudge out to the parking lot, and yell for it. This process—a global lookup—isn’t cripplingly slow, but do it enough times inside a tight loop and it starts to add up like a bar tab at a developer conference.

67.8 String Concatenation Performance: join() vs +=

Right, let’s settle this. You’ve probably heard, in some hushed, serious tone, that you should “never, ever use += for string concatenation in Python.” And you’ve probably nodded along, thinking it’s one of those sacred rules, like not using goto. But here’s the thing: like most absolute rules in programming, it’s a simplification. A useful one, but a simplification nonetheless. The reality is more interesting, and knowing why is what separates you from someone who just parrots dogma.

67.7 Python-Specific Wins: Local Variables, Attribute Lookup, Slots

Right, let’s talk about making your Python code less… pokey. We’ve all been there. You’ve written something beautiful, it’s logically pristine, and then you run it. And you go get a coffee. And you come back. And it’s still chugging away. Before you start rewriting the whole thing in Rust (a noble, if dramatic, impulse), let’s look at some of the low-hanging, Python-specific fruit you can pluck for some easy speed wins.

67.6 Algorithmic Optimization: Big-O Thinking in Python

Look, we need to talk. Your code is slow. It’s not your fault—well, maybe it is a little, but we can fix it. You’ve probably been micro-optimizing: swapping out lists for deques, using local_vars, and trying to save nanoseconds. That’s like rearranging the deck chairs on the Titanic. The real iceberg, the one that will sink your entire application, is a bad algorithm. And the only way to spot it is to start thinking in terms of Big-O notation.

67.5 py-spy: Sampling Profiler for Production

Alright, let’s talk about something that actually works: py-spy. This is the profiling tool you use when your application is on fire in production and you can’t just restart it with cProfile attached. It’s a sampling profiler, which is a fancy way of saying it peeks at what your Python process is doing, at regular intervals, without your code having to know it’s being watched. It’s like a wildlife documentary filmmaker hiding in a bush, not a stage actor performing for a camera. The key thing here is that it’s low-overhead and safe to run on live, production systems.

67.4 memory_profiler: Tracking Memory Usage

Right, let’s talk about memory. It’s the one resource your application can’t just politely ask for more of from the operating system until things get awkward and it gets killed. Most performance tutorials focus on CPU time, but memory bloat is a silent killer. It slows everything down thanks to garbage collection, and it can lead to catastrophic, opaque crashes. So we’re going to stop guessing and start measuring with memory_profiler. This tool is like putting a fuel gauge on your code’s gas-guzzling SUV.

67.3 line_profiler: Line-By-Line Timing

Alright, let’s get our hands dirty. You’ve probably used cProfile or timeit and found yourself staring at a function name, thinking, “Great, I know my_awful_function() is the problem. Now, which of the 200 lines of spaghetti inside it is actually causing the pain?” This is the universal frustration of function-level profilers. They get you to the crime scene, but not the specific bullet casing. Enter line_profiler. This beautiful tool is the detective that goes in and tells you exactly how many times each line was executed and, crucially, how long each one took on average. It’s the difference between knowing “the engine is broken” and knowing “spark plug #3 is misfiring.” We’re about to perform some open-heart surgery on your code, and this is our MRI machine.

67.2 cProfile and pstats: Function-Level Profiling

Right, so you’ve written some code. It works. You’re feeling pretty good about yourself. But is it fast? Or does it secretly run like a dog walking on its hind legs—technically impressive that it works at all, but you can’t help but wince while watching it? Guessing which part is slow is a fantastic way to be wrong and waste an afternoon. We don’t guess. We measure. And for that, we bring in the heavy artillery: cProfile.

67.1 timeit: Micro-Benchmarking Code Snippets

Right, let’s talk about timeit. You’ve probably had a thought like, “Is method A faster than method B?” and then, like a chump, wrapped it in a time.time() call and run it once. I’ve been there. The results are a lie. Your operating system is a chaotic, beautiful mess of processes fighting for CPU time, and your one-off measurement just captured a moment when a background antivirus scan decided to sigh heavily. We need to do better. timeit is how we do better. It’s the statistical sledgehammer we use to smash uncertainty about tiny, repetitive code.