Performance | mikePietsch.com

26.7 skipLibCheck and Its Trade-offs

Alright, let’s talk about skipLibCheck. This is one of those compiler options that sounds like a free performance boost, and honestly, it mostly is. But like most things that seem too good to be true, it comes with a small, gnarly catch. I’m going to explain what it is, why you should almost always turn it on, and what that catch actually means for you. In a nutshell, skipLibCheck: true tells the TypeScript compiler, “Hey, don’t bother doing a full type check on the .d.ts files (declaration files) in my node_modules folder.” Think of it as a bouncer at an exclusive club. Instead of meticulously checking the ID of every single person in the line (your dependencies’ dependencies’ dependencies), it just lets everyone in who’s already on the list, trusting that the list is probably correct. This saves a massive amount of work.

26.6 SWC and esbuild: Transpile-Only Builds for Speed

Right, let’s talk about speed. You’ve felt it, that agonizing lag between hitting save and your TypeScript project finishing its compile cycle. It starts as a minor annoyance and slowly grows into a soul-crushing time sink. The culprit? tsc, the official TypeScript compiler, is doing a lot of heavy lifting for you: type checking, transpiling to your target JavaScript, handling all those fancy tsconfig.json options. It’s brilliant, but it’s a scholar, not a sprinter.

26.5 Avoiding Expensive Type Patterns: Deep Recursive Types

Let’s be honest: you’re not thinking about TypeScript’s type system performance until your IDE starts to stutter and your tsc --watch feels like it’s running on a potato. That’s when you meet the deep recursive type. It’s the type-level equivalent of a Rube Goldberg machine—impressively clever, but you wouldn’t want to use it to make your morning coffee. The core of the problem is simple: some types are just expensive to compute. The TypeScript compiler is brilliantly fast, but it’s not magic. When you create a type that forces it to perform a deep, recursive calculation across a massive structure, you’re asking it to solve a complex puzzle. Every. Single. Time. You. Save. A. File.

26.4 Type-Only Imports and Reducing Declaration Emit Work

Right, let’s talk about one of the single biggest “aha!” moments for speeding up your TypeScript compilation: import type. This isn’t just a fancy syntax; it’s a cheat code that tells the TypeScript compiler, “Hey, relax, we’re not actually going to run this. I just need to know what it looks like.” Think of your average import statement as ordering a full, assembled piece of IKEA furniture. The delivery truck (your bundler) shows up, you get the massive box (the module’s code), and you have to haul it inside and put it together (include it in your JavaScript output). Now, imagine if you could just call the company, describe the furniture, and have them fax you the assembly instructions (the type definitions) without the physical box ever showing up at your door. That’s what import type does. It only brings over the type information and leaves absolutely no trace in your final JavaScript. This is a huge win because it reduces the amount of code the compiler and your bundler have to process.

26.3 Project References for Large Monorepos

Right, so your monorepo has gotten big. The node_modules directory has its own gravitational pull, running tsc feels like you’re asking your laptop to calculate the meaning of life, and you’re pretty sure you just saw the progress bar actually get slower. Welcome. This is where TypeScript’s Project References come in, and they are about to become your new best friend. They’re not magic, but they’re the closest thing we have to a free lunch in the TypeScript world.

26.2 Incremental Compilation: tsbuildinfo Files

Right, let’s talk about one of the few things TypeScript got genuinely, unambiguously right for performance: incremental compilation. If you’ve ever sat there, drumming your fingers, waiting for a full project rebuild after changing a single comma, this is the feature that stops that particular brand of madness. At its core, incremental compilation is a simple concept: instead of rebuilding the entire world from scratch every time you run tsc, the compiler saves a little file of its own homework—a .tsbuildinfo file—that tells it exactly what work it doesn’t need to redo. It’s the compiler’s version of “I’ll just leave these parts out and only fix the bit I messed up.”

26.1 Diagnosing Slow TypeScript Builds: --diagnostics and --extendedDiagnostics

Right, so your builds are slow. You’ve felt that creeping dread as you hit tsc and go make a coffee, only to return and find it’s still chugging away. Welcome to the club. The first rule of TypeScript Performance Club is: you don’t just guess what’s wrong. You use the tools designed to tell you. That’s where --diagnostics and its more verbose cousin, --extendedDiagnostics come in. Think of them as the MRI machine for your ailing compilation process.

26. TypeScript Performance and Compilation Speed

37.8 Benchmarking Best Practices and Avoiding Compiler Tricks

Right, let’s get our hands dirty. Benchmarking in Go is deceptively simple, which is precisely why so many people get it subtly, tragically wrong. The testing package gives you just enough rope to hang yourself with, and the compiler—oh, the clever, clever compiler—is actively looking for a reason to snip your code into oblivion. Our job is to outsmart it, to force it to show us the real performance cost, not the cost of a cleverly optimized mirage.

37.7 String Interning and bytes.Buffer vs strings.Builder

Right, let’s talk about strings. You love them, I love them, the Go runtime tolerates them. They’re the duct tape of our programs, holding everything together until they suddenly become the number one reason your elegant service is now gasping for memory like a fish on a sidewalk. The fundamental problem is that strings in Go are immutable. This is a fantastic feature for concurrency and safety, but a real pain when you’re building them up in a hot loop. Every time you write s += "new piece", you’re not just appending; you’re allocating a whole new string, copying both s and "new piece" into it, and then sending the old s off to be cleaned up by the garbage collector (GC). Do this a few thousand times and your GC is going to be working overtime, putting a serious damper on your throughput.

37.6 Reducing Allocations: sync.Pool, Value Types, and Preallocating Slices

Right, let’s talk about allocations. In the world of Go, allocations are like trips to the garbage can: you have to do them, but if you’re running back and forth every five seconds, you’re not getting any real work done. The garbage collector is incredibly smart, but it’s not clairvoyant. Every time you escape to the heap, you’re giving it more work to do later, which means eventually, it will have to stop your world (or at least a big part of it) to clean up your mess.

37.5 Escape Analysis: go build -gcflags -m

Alright, let’s get our hands dirty with one of Go’s coolest party tricks: escape analysis. This isn’t some abstract academic concept; it’s the compiler’s way of making a crucial decision for you: “Should this variable live on the stack, nice and cheap, or does it need to escape to the heap, the land of garbage collection and slower allocations?” To see the compiler’s thought process laid bare, we use the -gcflags="-m" flag. Running go build -gcflags="-m" your_file.go will spit out a torrent of messages telling you exactly what escapes and, more importantly, why. Let’s decode this output together.

37.4 go tool trace: Goroutine and Scheduler Traces

Alright, let’s get our hands dirty with go tool trace. You’ve probably been staring at CPU and memory profiles until your eyes cross, wondering why your beautifully concurrent Go application isn’t going as fast as it should. Sometimes, the problem isn’t what your code is doing, but how and when the goroutines are being scheduled to do it. That’s where the execution tracer comes in. It’s like getting a top-down view of a busy highway system; a CPU profile just tells you which cars are revving their engines the hardest.

37.3 go tool pprof: Reading Profiles and Flame Graphs

Right, let’s get our hands dirty. You’ve just run your Go service under pprof, you’ve captured a profile, and now you’re staring at a terminal prompt or a scary-looking SVG. It feels like you’ve been handed the blueprints to a skyscraper written in a foreign language. Don’t panic. We’re going to learn that language together. The first thing to internalize is that pprof is not a single tool; it’s a Swiss Army knife with a dozen blades. The most common profiles you’ll grab are the CPU profile and the Heap (memory) profile. They answer two fundamentally different questions: “What is burning my CPU time?” and “Where is my memory getting allocated?”.

37.2 net/http/pprof: Live Profiling of Running Servers

Right, so you’ve got a service running. It’s chugging along, but something’s off. Maybe it’s a bit sluggish under load, or perhaps memory usage is doing a concerning impression of a ski jump. You need to see what’s happening right now, on its terms, in production. You don’t get to stop the world and attach a debugger. This is where net/http/pprof becomes your best friend—a Swiss Army knife that’s mostly sharp blades for introspection.

37.1 pprof: CPU and Memory Profiling

Right, let’s talk about pprof. This isn’t some abstract academic concept; it’s the scalpel you use when your application starts coughing up blood. You don’t just “think” your code is slow—you know it, with data. pprof is how you get that data. It’s the single most powerful tool in the Go profiler’s arsenal, and it’s built right into the standard library. The designers at Google, for all their quirks, absolutely nailed this one.

37. Performance Profiling and Optimization in Go

43.8 Using the Well-Architected Tool for Workload Reviews

Right, so you’ve decided to be a responsible adult and actually review your AWS architecture instead of just crossing your fingers and hoping the bill doesn’t hit five figures this month. Good for you. The Well-Architected Framework is your guide, but staring at a 60-page PDF is a special kind of torture. Enter the Well-Architected Tool. This isn’t some clunky, on-premises software you have to install; it’s a service in your AWS console that finally makes this framework feel usable. Think of it as the difference between reading the theory of aerodynamics and having a flight simulator.

43.7 Sustainability: Understanding Impact, Establishing Goals, Maximizing Utilization

Alright, let’s talk sustainability. You’ve probably heard it called “green IT” and pictured someone hugging a tree while their CI/CD pipeline deploys a carbon-spewing monolith. It’s more nuanced than that. In the AWS context, sustainability is about squeezing every last drop of useful work out of the energy your systems consume. It’s not just good for the planet; it’s a fantastic proxy for cost efficiency and performance. Waste less energy, pay less money. It’s a beautiful, beautiful alignment of incentives.

43.6 Cost Optimization: Cloud Financial Management, Expenditure Awareness, Optimizing Resources

Right, let’s talk about money. Because if you’re not paying attention to this, you’re not just building on AWS, you’re donating to it. The cloud’s biggest trick is making cost an abstract, after-the-fact concept. You spin up a monster instance for a two-hour task, forget about it, and get a bill that looks like a phone number. Cost Optimization is the pillar where we grow up, put on our big-kid pants, and start treating the cloud like the powerful, pay-as-you-go tool it is, not an infinite magic money pit.

43.5 Performance Efficiency: Selecting the Right Resource Types and Sizes

Right, let’s talk about making your stuff fast without making your bill terrifying. Performance Efficiency isn’t about throwing the biggest, most expensive instance at every problem until it goes away. That’s the architectural equivalent of using a rocket launcher to open a jar of pickles—it works, but the cleanup is horrific and your landlord will be furious. It’s about being smart, picking the right tool for the job, and knowing that in AWS, the “right tool” changes about every six months.

43.4 Reliability: Foundations, Workload Architecture, Change Management, Failure Management

Right, let’s talk about keeping your stuff running. Not just “it didn’t crash” running, but “it actually does what you told users it would do” running. That’s Reliability. The Framework breaks this down into four sensible, if slightly dry-sounding, pillars. Let’s breathe some life into them. Foundations Before you even think about your fancy application code, you need to build on stable ground. This is the unsexy, absolutely critical plumbing of your AWS existence. It’s mostly about your Network and IAM. Get these wrong, and your beautifully architected microservice is just a very expensive, very confused brick.

43.3 Security: Identity, Detective Controls, Infrastructure Protection, Data Protection

Right, let’s talk security. Not the “change your password every 90 days” kind of corporate nonsense, but the real, gritty, “how do I keep my digital crown jewels from ending up on a hacker forum” kind. The AWS Well-Architected Framework’s Security Pillar isn’t a checklist; it’s a mindset. It’s about assuming breach, limiting blast radius, and automating the heck out of everything because you, my friend, have better things to do than manually check CloudTrail logs at 3 AM. We’ll break it down into its core areas, but remember, they’re all interconnected. A failure in one is a failure in all.

43.2 Operational Excellence: IaC, Small Frequent Changes, Observability

Look, let’s be honest. “Operational Excellence” sounds like a corporate buzzword your manager would put on a motivational poster next to a picture of a mountain. But in the AWS universe, it’s the secret sauce. It’s the difference between you owning your infrastructure and your infrastructure owning you. It’s about building a system that doesn’t just work, but that you can actually operate without needing a PhD in caffeine consumption and a team of on-call wizards. We’re going to focus on three pillars that make this real: treating your infrastructure like code, making changes so small they’re almost boring, and having such good observability you feel like you’ve got x-ray vision.

43.1 The Six Pillars: Operational Excellence, Security, Reliability, Performance, Cost, Sustainability

Right, let’s talk about the Well-Architected Framework. You’ve probably seen the logo on a thousand AWS slides. It’s not just marketing fluff; it’s a shockingly useful mental checklist to stop you from building a Rube Goldberg machine of cloud infrastructure that collapses the second a pigeon lands on it. Think of these six pillars not as a test you pass, but as a set of questions you should be constantly asking yourself. Because if you’re not, I promise you, your bill and your pager duty roster are.

43. AWS Well-Architected Framework

44.7 Controller Manager and Scheduler Tuning Flags

Right, so you’ve got your cluster up, your pods are running, but something just feels… sluggish. Deployments take a geological age to roll out, or your nodes are sitting there half-asleep while pods languish in “Pending” purgatory. Before you start yelling at the autoscaler, let’s talk about the two brainstems of your control plane: the Controller Manager and the Scheduler. They’re the anxious, overworked organizers of your cluster, and sometimes you need to adjust their caffeine intake.

44.6 Image Pull Optimization: Pre-Pulling and Image Streaming

Right, let’s talk about getting your container images onto your nodes. This is one of those things you blissfully ignore until it isn’t working, and then it becomes the single most infuriating bottleneck in your entire deployment. A slow ImagePull can turn a rapid, 30-second rollout into a minutes-long agonizing wait, or worse, cause your shiny new Pod to fail and get stuck in ImagePullBackOff hell. We’re going to fix that. We’re going to make your image pulls so efficient it’ll make the container registry blush.

44.5 Node Local DNSCache: Eliminating DNS Bottlenecks

Right, let’s talk about one of the most common, yet most insidious, performance killers in Kubernetes: DNS latency. You’ve probably seen it. Your application isn’t CPU-bound, it’s not memory-bound, but it just feels… sluggish. A request comes in, and it spends half its life just trying to figure out where to go. That’s DNS for you. It’s the phone book of the internet, and in a dynamic environment like K8s, you’re looking up numbers constantly. Every service discovery call, every database connection string resolution, every call to an external API—it all goes through the cluster’s DNS resolver. And by default, that means a trip to kube-dns/CoreDNS on every single pod. This creates a massive bottleneck at the cluster level, a single point of contention for every microservice chatty enough to rival a royal court.

44.4 Reducing Pod Startup Latency

Right, let’s talk about pod startup latency. You’ve deployed your masterpiece, hit that kubectl apply -f command, and are now waiting. And waiting. And… why is this taking so long? It feels like your pod is waiting for a background check before it can run a simple web server. I’ve been there. The truth is, a pod’s journey from “Pending” to “Running” is a gauntlet of bureaucratic checks, and our job is to grease the wheels.

44.3 etcd Performance: SSD Requirements and Compaction

Right, let’s talk about the brain of your Kubernetes cluster: etcd. If the API server is the charismatic frontman of the band, etcd is the meticulous, hyper-organized manager in the back without whom the whole tour collapses into chaos. It’s a distributed key-value store, and its sole job is to remember the state of absolutely everything in your cluster. And because we’re asking it to do this consistently and quickly, it gets… particular. Performance-wise, if your etcd is unhappy, your entire cluster is unhappy. Pods won’t schedule, deployments will hang, and you’ll be left staring at a kubectl get pods that hasn’t updated in minutes.

44.2 API Server Performance: Rate Limiting and Caching

Alright, let’s talk about the brain of your Kubernetes cluster: the API Server. It’s the grand central station for every single request, from kubectl get pods to the kubelet checking in on what it should be running. And like any good central station, it can get completely overwhelmed if you let everyone stampede through at once. That’s where rate limiting and caching come in. They’re the bouncers and the express lanes that keep this whole operation from collapsing into a fireball of 429 Too Many Requests errors.

44.1 Kubernetes at Scale: Tested Limits and Real-World Numbers

Right, let’s talk about scale. You’ve probably seen the eye-watering, “look-at-me” conference talk numbers from Google or Netflix about running eleventy-billion pods. That’s great for them. We live in the real world, where your cluster isn’t running on a planet-sized data center and your CFO has questions about the cloud bill. So let’s get practical. What actually breaks first when you push a Kubernetes cluster, and what can you do about it? Forget the theory; these are the pressure points I’ve seen burst in production.

44. Kubernetes Performance Tuning

38.8 netstat and ss: Socket and Network Connection Statistics

Alright, let’s get our hands dirty with the plumbing of your network. When things are slow, flaky, or just plain broken, you need to know what’s talking to what. Forget the fancy GUI tools that try to pretty this up; we’re going straight to the source. For decades, the go-to for this has been netstat. It’s the old guard, and it’ll get the job done. But on modern Linux systems, we have a faster, more informative successor: ss. Think of netstat as your reliable but slightly creaky old toolbox, and ss as the shiny new socket wrench set that does the same job but better and faster. I’ll show you both, because you’ll still see netstat everywhere, but you should default to ss.

38.7 lsof: Listing Open Files, Sockets, and Processes

Right, let’s talk about lsof. It’s one of those tools you’ll use once a year and think, “Why don’t I use this magnificent beast more often?” Its name, lsof, is a classic piece of Unix utility naming: brutally literal. It stands for “list open files.” But on a Unix-like system, everything is a file. And I mean everything. Your hard drive? File. Your network connection? File (a socket, technically). That running process? You get the idea. This means lsof is your ultimate backstage pass to see what every process on your system is actually doing—what it’s reading, writing, and talking to. It’s the debugger’s equivalent of a truth serum.

38.6 ltrace: Tracing Library Calls

Right, let’s talk about ltrace. If strace is the tool you use to see what your program is doing (its syscalls), ltrace is the tool you use to see what it’s thinking. It shows you all the library calls it’s making. This is where you go when your program isn’t segfaulting but is, instead, just being profoundly weird or slow. It’s like eavesdropping on your application’s conversation with the shared libraries it depends on.

38.5 strace: Tracing System Calls for Debugging

Right, let’s talk about strace. You’ve probably hit a wall where your application is doing… something… but you have no earthly idea what. It’s not logging, it’s not crashing, it’s just sitting there, taunting you. This is where strace becomes your best friend. It’s the debugger of last resort, the ultimate truth-teller. It shows you the raw conversation between your program and the Linux kernel—every file it opens, every network call it makes, every time it asks the system for the time of day. It doesn’t lie.

38.4 perf: Linux Performance Counters and Profiling

Right, let’s talk about perf. If you’re serious about figuring out why your code is slow, leaking memory, or just generally misbehaving on Linux, this is your new best friend. It’s not a tool; it’s a sprawling ecosystem of tools built into the kernel, and it’s so powerful it’s almost absurd that it’s free. Forget guessing. We’re moving to evidence-based profiling. Think of perf as a high-speed data recorder for your CPU. It can tell you which lines of code are getting executed millions of times, what’s causing those pesky cache misses, and where the kernel is spending its time on your behalf. It does this using Performance Monitoring Units (PMUs) – hardware counters on the CPU itself. This means it’s incredibly low-overhead. We’re not talking about adding print statements here; we’re talking about directly querying the processor’s internal statistics.

38.3 sar: System Activity Reporter and Historical Data

Right, let’s talk about sar. This is the tool you use when you get that 3 AM alert about a server being “slow” and you need to figure out what actually happened six hours ago. While top or htop show you the glorious, burning dumpster fire of the present moment, sar is the historian who kept meticulous, if slightly dry, notes on the entire blaze. It’s part of the sysstat package, and if you’re not installing that by default on every Linux system you touch, we need to have a different conversation first.

38.2 iostat: Disk I/O Throughput and Utilization

Right, let’s talk about iostat. You’ve probably felt it—that creeping suspicion that your application is just… waiting. Waiting on what? The disk. It’s the slowest part of the whole party, and when it’s struggling, everything grinds to a halt. top might show you a sky-high %wa (I/O wait), but that’s just the headline. iostat is the full, gritty investigative report. It tells you which disk is the problem child and exactly what kind of tantrum it’s throwing.

38.1 vmstat: Virtual Memory, Swap, CPU, and Block I/O Statistics

Alright, let’s talk about vmstat. It’s one of those old-school Unix tools that has stubbornly refused to die, and for good reason: it gives you a shockingly comprehensive, low-overhead snapshot of what your system’s core components are doing at any given moment. The name stands for “virtual memory stat,” but that’s a bit of a misdirection. It’s like calling a Swiss Army knife a “blade holder.” Sure, virtual memory stats are in there, but you’re also getting CPU, swap, and block I/O—all in one dense, text-based punch.

38. Performance Monitoring Tools

36.7 Contributing to Hugo: Setting Up the Dev Environment

Right, you want to peek under the hood and maybe even tweak the engine. Good for you. Setting up Hugo’s dev environment isn’t the mystical ritual some projects make it out to be, but it does have a few quirks you need to get right, or you’ll be chasing phantom errors for hours. I’ve been there, and my goal is to make sure you aren’t. First things first: you absolutely must use the version of Go that Hugo specifies. This isn’t a gentle suggestion; it’s the law around these parts. Hugo uses Go modules and leverages specific features of the language that can change between minor versions. Using the wrong version is the single biggest cause of “but it compiles on my machine” problems.

36.6 How partialCached Works Internally

Right, let’s pull back the curtain on partialCached. You’re probably using it because you heard it’s a “performance win,” and it is, but you need to understand its particular brand of magic to avoid its particular brand of heartbreak. Think of it not as a smarter partial, but as a slightly lazy, forgetful, but very efficient clone of your partial. Its core purpose is brutally simple: avoid re-rendering the same template with the same input data more than once during a single site build. The key words there are “same input” and “single site build.” This isn’t a persistent cache between builds; it’s a short-term memory for the duration of the hugo command you just ran.

36.5 The livereload WebSocket for the Dev Server

Right, let’s talk about the magic trick that makes Hugo’s development server so damn useful. You save a file, you flick your eyes to the browser, and the page is just… updated. No frantic mashing of Cmd+R. It feels like the future. It’s not magic, of course; it’s a clever, slightly cantankerous system built on WebSockets, and understanding how it works will save you from pulling your hair out when it occasionally decides to take a coffee break.

36.4 Parallel Rendering with Goroutines

Right, let’s talk about how Hugo actually builds your site without taking a geological epoch to do it. You’ve probably run hugo and been pleasantly surprised by how fast it is, especially compared to… well, pretty much every other static site generator. The secret sauce isn’t magic; it’s a disciplined, aggressive use of concurrency via Go’s goroutines. Think of your site as a giant pile of pages that all need to be rendered. A naive approach would be to grind through them one at a time, in a single, sad, linear thread. If you have 500 pages and each takes 100ms, that’s 50 seconds. Yawn. Hugo looks at that pile and says, “I’ve got 8 CPU cores and a need for speed,” and it fans that work out across hundreds or even thousands of goroutines.

36.3 Template Compilation and Caching

Right, let’s get into the engine room. You’ve told Hugo to build your site, and it’s staring at your templates. It doesn’t just slap your data into some text files and call it a day. Oh no. It performs a complex, multi-stage compilation process that is, frankly, the reason it’s so damn fast on rebuilds. The secret sauce here is a combination of aggressive, intelligent caching and a compilation process that turns your templates and partials into Go functions. Yes, you read that right. Your HTML templates become executable Go code. Let that sink in for a moment.

36.2 Content Ingestion: Reading, Parsing, and Front Matter Decoding

Right, let’s get our hands dirty. You’ve told Hugo where your content is, and you’ve run hugo server. The first thing it does is the most crucial: it has to actually read your files and figure out what the hell they are. This isn’t just a simple file copy; it’s a full-on archaeological dig, and Hugo is the over-caffeinated professor who has to categorize every artifact before the museum (your public directory) opens.

36.1 Hugo's Source Code Architecture

Alright, let’s pull back the curtain. You don’t need to know this to use Hugo, but if you’re here, you’re the kind of person who hates magic boxes. You want to know which lever does what, just in case the box starts smoking. I respect that. Hugo’s architecture is a fascinating study in pragmatic design—a blend of brilliant engineering and “well, it works, so we’re keeping it.” At its core, Hugo is a stateless, sequential build pipeline. I say “stateless” because between runs, it doesn’t retain any memory of the previous build. It reads everything from the source filesystem every single time. This is both its greatest strength (simplicity, reliability) and a potential weakness for enormous sites (though the Go templating engine is so blisteringly fast it often doesn’t matter).

36. Hugo Internals: How the Build Pipeline Works

28.7 Hugo's --navigateToChanged for Fast Dev Iteration

Let’s be honest: you’re not here to watch your entire site rebuild from scratch every time you fix a typo. You want speed. You want to see your change, and only your change, reflected instantly. That’s the dream, and Hugo’s --navigateToChanged flag is the closest you’ll get to a direct portal to that dream. It’s not magic, but it’s such a clever piece of engineering that it feels like it.

28.6 Large Site Strategies: Pruning, Sections, and Draft Exclusion

Alright, let’s get our hands dirty. You’ve hit that point, haven’t you? The point where running hugo server feels less like a build step and more like you’ve just asked your laptop to calculate every prime number. A site with thousands of pages will do that. It’s not Hugo’s fault; it’s just math. But we’re not here to complain about math, we’re here to cheat at it. The core strategy is brutally simple: build less stuff. Hugo is wonderfully, blissfully literal. It will happily build every single page it can find, regardless of whether you need it right now. Your job is to tell it what to ignore.

28.5 Optimizing Remote Resource Fetching

Right, let’s talk about fetching stuff from the internet. You’ve probably got a Hugo site that pulls in data from some remote API, or maybe you’re building an image gallery from a CDN. It’s fantastic until you run hugo server and go make a coffee while it decides to re-fetch every single resource, every single time. This is the digital equivalent of your friend who tells the same long-winded story whenever you see them. We’re going to fix that.

28.4 The File Cache: resources/, hugo_cache/

Right, let’s talk about the file cache. This isn’t some magical, abstract layer; it’s literally a directory on your disk where Hugo stashes stuff it thinks it might need again. Its entire reason for being is to stop Hugo from doing the same expensive work over and over. Think of it less like a “cache” and more like Hugo’s workshop whiteboard—it’s covered in half-scribbled calculations so it doesn’t have to re-derive the Pythagorean theorem every time it needs to build a page.

28.3 partialCached: The Single Biggest Performance Win

Alright, let’s get down to brass tacks. If you’re building a site of any real size with Hugo, you’ve probably noticed your build times starting to creep up from a blink to a coffee break. You’re running hugo server and then changing a single Markdown file, only to watch Hugo thoughtfully re-render… well, everything. It’s polite, but it’s also absurdly inefficient. This is where partialCached comes in, and it is, without a shred of hyperbole, the single most effective tool in your arsenal for slamming the brakes on runaway build times. Forget magic tricks; this is simple, brutal, and effective engineering.

28.2 Measuring Build Time: hugo --templateMetrics and --templateMetricsHints

Right, let’s get our hands dirty. You’ve probably noticed Hugo is fast, but maybe your site has grown, and that initial speed has started to feel a bit… theoretical. Before you start randomly tweaking things in a panic, you need to know what’s actually slow. Throwing --templateMetrics and --templateMetricsHints at Hugo is like switching from a polite conversation about the weather to getting a full diagnostic readout from a jet engine. It’s brutally honest, occasionally terrifying, and exactly what you need.

28.1 Why Hugo Is Fast: Parallel Rendering and In-Memory Caching

Right, let’s get into the good stuff. You’ve probably heard that Hugo is “blazingly fast.” It’s not just marketing fluff; it’s the architectural hill the framework’s designers decided to die on, and frankly, I respect the commitment. While other generators are busy waiting for a database or re-compiling the same JavaScript for the tenth time, Hugo has already finished building your entire site and is now just sitting there, smugly, wondering what to do with all its free time. The secret sauce is a ruthless, almost obsessive, focus on two things: doing as much work as possible in parallel and keeping everything it possibly can in memory.

28. Hugo Build Performance and Caching

14.7 Common Partials: head.html, header.html, footer.html, SEO meta tags

Right, let’s talk about the workhorses of your template directory. These are the files you’ll include on nearly every page, the ones that handle the repetitive, soul-crushing boilerplate so you don’t have to. We’re going to make them smart, then stitch them together into something that doesn’t suck. The head.html Partial: Your Page’s First Impression This little guy is arguably the most important. It’s not seen by your users, but it’s read very carefully by browsers and search engines. A messy head is like showing up to a job interview with your shirt on inside-out. Let’s get it right.

14.6 Organizing Partials: Nested Directories

Alright, let’s talk about organizing these partials. You’ve got a few of them now, and your templates directory is starting to look like my desk on a bad day: a chaotic mess where finding anything requires archaeological skills. We need a system. The moment you have more than a handful of partials, you’ll feel the pain. Is header.html for the blog or the store? Is that card.html for a product, a user profile, or a cat photo? Throwing them all into a single flat directory is a rookie move that scales horribly. The solution, like in any good codebase, is to use directories to create namespaces. We’re going to nest them.

14.5 Returning Values from Partials with return (Hugo 0.111+)

Right, so you’ve been building up your partials, making them nice and reusable, and then you hit a wall. You need a snippet of logic to not just output some HTML, but to actually give you back a value—a string, a slice, a boolean, something you can use in the parent template’s logic. You’ve probably tried the old trick of using .Scratch.Set and felt a little dirty about it. I don’t blame you. It worked, but it was clunky and indirect, like passing a note through three friends to ask someone out.

14.4 partialCached: Dramatic Build Performance Improvement

Right, so you’ve met {{ partial }} and you’re probably thinking, “These are great! I can break my site into logical, reusable chunks.” And you’d be right. But you’re about to hit a wall. A big, slow, frustrating wall. Every time you change a single line of CSS, Hugo has to rebuild your entire site’s structure because it can’t be sure that change didn’t affect the header partial used on every single page. It’s maddening.

14.3 Passing Context to Partials: The Dot and Custom Dicts

Right, so you’ve broken your UI into partials. Good for you. Now you’re staring at a template that looks like this, wondering how the heck you get data into that isolated little island. {{ template "user-card" }} It renders, but it’s a ghost town. Your brilliant user-card partial is starving for data. This is where you stop just including a partial and start calling it with arguments. Welcome to the main event. The Dot: Passing the Entire Context The simplest way to feed your partial is to just hand it your entire current context (a.k.a., “the dot”). You do this by piping the dot into the template call.

14.2 Calling Partials: partial and partialCached

Alright, let’s talk about getting these partial templates onto your page. You’ve defined these reusable fragments, these beautiful little components of HTML, and now you need to actually, you know, use them. Hugo gives you two main ways to do this: partial and partialCached. One is your reliable workhorse, the other is a performance-obsessed specialist. Choosing the right one is the difference between a swift, elegant site and one that’s constantly doing unnecessary heavy lifting.

14.1 The layouts/partials/ Directory

Right, let’s talk about the layouts/partials/ directory. This isn’t just another folder Hugo tells you to use; this is your new best friend, your toolbox, your secret weapon against the soul-crushing dread of repeated code. Think of partials as the reusable fragments of your site’s UI. That header you copy-paste into every single layout? That’s a partial. The footer that’s identical on every page? Partial. That complex “related articles” component you spent three days building? For the love of all that is holy, make it a partial.

14. Partial Templates: Reusable Fragments

17. useMemo

18. useCallback

19. React.memo

27. Kafka Performance Tuning

30. Performance Profiling with React DevTools

1. What WebAssembly Is and Why It Was Created

13. The N+1 Problem

14. DataLoader: Batching and Caching

15. Font Optimization

15. User-Defined Functions: UDFs, Pandas UDFs, and Performance

2. WASM vs JavaScript

22. Persisted Queries

24. Performance Tuning: Partitioning, Caching, Skew, and AQE

25. Performance Profiling and Optimization

28. Core Web Vitals and Performance Optimization

69.8 mypyc: Compiling mypy-Annotated Code

Right, so you’ve written some beautiful, type-annotated Python with mypy. It’s clean, it’s correct, and it runs… well, like Python. You get the safety of static types but the speed of a dynamically typed language, which is to say, not fast. Enter mypyc. This isn’t some magic wand; it’s a compiler that takes your meticulously annotated code and translates it into C extensions, giving you a very real shot at performance that’s an order of magnitude better. Think of it as your reward for being pedantic about types.

69.7 Numba: JIT Compilation for Numerical Code

Right, so you’ve hit the wall. You’ve vectorized your NumPy code, you’ve tried every trick in the book, and your inner loops are still crawling because Python, bless its heart, is still interpreting every single operation. You could drop down to C and write a full extension module, but that feels like taking a sledgehammer to a walnut. What if you could just… tell Python to compile this one specific function to machine code? Enter Numba. It’s a Just-In-Time (JIT) compiler that takes your anemic Python functions and injects them with pure, unadulterated speed, often getting you within spitting distance of hand-written C.

69.6 PyPy: JIT Compilation and Compatibility Trade-offs

Right, so you’ve heard the whispers. “PyPy makes your Python code magically faster.” And it’s true, it often does. But it’s not magic; it’s a Just-In-Time compiler, and like any powerful tool, it comes with a very specific set of instructions and, more importantly, trade-offs. My job is to make you understand both the ‘how’ and the ‘why’ so you can decide if it’s the right tool for your particular job.

69.5 Writing a CPython Extension Module in C

Right, so you want to talk to Python directly, in its mother tongue: C. You’re tired of pure Python’s speed limits, or maybe you need to wrap some arcane C library that doesn’t have a decent Python binding. Welcome to the club. Writing a CPython extension module is the most “bare-metal” way to do this. It’s powerful, it’s fast, and it will absolutely, 100%, make you appreciate the cleanliness of Python syntax. We’re about to get our hands dirty.

69.4 Cython: Annotating Python for C-Speed Compilation

Right, so you want to go fast. You’ve got a Python function that’s become the bottleneck, grinding your elegant script to a halt in a loop of a million iterations. Rewriting it all in C sounds like a nightmare of buffer overflows and segfaults. Enter Cython. It’s not magic, but it’s the closest thing we have to a “go faster” button for Python. The deal is simple: you take your perfectly good Python code and give the Cython compiler a few hints—type annotations, mostly—about what things are. In return, it transpiles your Python into fiercely optimized C code, which then gets compiled into a native binary extension module you can import directly. It’s Python wearing a C-shaped performance skin suit.

69.3 cffi: C Foreign Function Interface

Right, so you’ve decided you need to talk to some C code. Maybe you’re tired of Python’s speed in a particular hot loop, or maybe you’re staring at a dusty, ancient library that does exactly what you need but has never heard of a list comprehension. You’ve probably heard of the old ways: writing a full C extension is a fantastic way to spend a weekend learning about reference counting and the perils of the Python GIL, while ctypes feels like trying to convince a very pedantic bouncer at a club to let your data structures in.

69.2 ctypes: Calling C Libraries from Pure Python

Alright, let’s talk about ctypes. This is the part where Python, the friendly neighborhood interpreter, puts on a leather jacket, smashes a window, and just starts using the C library sitting right there on the desktop. No compilers, no extension modules, just pure, unadulterated dynamic linking. It’s shockingly powerful and, at times, shockingly janky. I love it and you should too, but you should also know what you’re getting into.

69.1 When to Reach for a C Extension

Look, sometimes you just have to get down to the metal. Python is brilliant, but let’s be honest: it’s not always fast. When you’ve optimized your algorithms, used vectorized NumPy operations, and your profiler is still pointing a trembling finger at one critical inner loop, it’s time to talk about writing a C extension. This isn’t for the faint of heart. It’s the power tool in your shed—incredibly effective, but you can also take your foot off if you’re not careful.

69. Cython, ctypes, C Extensions, and PyPy

67.9 Avoiding Global Lookups and Repeated Attribute Access

Alright, let’s talk about one of the most common, and frankly, easiest-to-fix performance drains I see in the wild: the double-whammy of global lookups and repeated attribute access. This isn’t about fancy algorithmic wizardry; this is about cleaning up the sloppy, lazy code we all write at 2 AM so it doesn’t embarrass us in the light of day. Think of your code’s namespace as a series of concentric circles. The innermost circle is your local scope. It’s the VIP lounge—getting in there is fast, cheap, and easy. The outermost circle is the global scope, which includes built-in names like len or str. That’s the parking lot. Every time you reference a name in that global scope, the interpreter has to leave the comfy VIP lounge, trudge out to the parking lot, and yell for it. This process—a global lookup—isn’t cripplingly slow, but do it enough times inside a tight loop and it starts to add up like a bar tab at a developer conference.

67.8 String Concatenation Performance: join() vs +=

Right, let’s settle this. You’ve probably heard, in some hushed, serious tone, that you should “never, ever use += for string concatenation in Python.” And you’ve probably nodded along, thinking it’s one of those sacred rules, like not using goto. But here’s the thing: like most absolute rules in programming, it’s a simplification. A useful one, but a simplification nonetheless. The reality is more interesting, and knowing why is what separates you from someone who just parrots dogma.

67.7 Python-Specific Wins: Local Variables, Attribute Lookup, Slots

Right, let’s talk about making your Python code less… pokey. We’ve all been there. You’ve written something beautiful, it’s logically pristine, and then you run it. And you go get a coffee. And you come back. And it’s still chugging away. Before you start rewriting the whole thing in Rust (a noble, if dramatic, impulse), let’s look at some of the low-hanging, Python-specific fruit you can pluck for some easy speed wins.

67.6 Algorithmic Optimization: Big-O Thinking in Python

Look, we need to talk. Your code is slow. It’s not your fault—well, maybe it is a little, but we can fix it. You’ve probably been micro-optimizing: swapping out lists for deques, using local_vars, and trying to save nanoseconds. That’s like rearranging the deck chairs on the Titanic. The real iceberg, the one that will sink your entire application, is a bad algorithm. And the only way to spot it is to start thinking in terms of Big-O notation.

67.5 py-spy: Sampling Profiler for Production

Alright, let’s talk about something that actually works: py-spy. This is the profiling tool you use when your application is on fire in production and you can’t just restart it with cProfile attached. It’s a sampling profiler, which is a fancy way of saying it peeks at what your Python process is doing, at regular intervals, without your code having to know it’s being watched. It’s like a wildlife documentary filmmaker hiding in a bush, not a stage actor performing for a camera. The key thing here is that it’s low-overhead and safe to run on live, production systems.

67.4 memory_profiler: Tracking Memory Usage

Right, let’s talk about memory. It’s the one resource your application can’t just politely ask for more of from the operating system until things get awkward and it gets killed. Most performance tutorials focus on CPU time, but memory bloat is a silent killer. It slows everything down thanks to garbage collection, and it can lead to catastrophic, opaque crashes. So we’re going to stop guessing and start measuring with memory_profiler. This tool is like putting a fuel gauge on your code’s gas-guzzling SUV.

67.3 line_profiler: Line-By-Line Timing

Alright, let’s get our hands dirty. You’ve probably used cProfile or timeit and found yourself staring at a function name, thinking, “Great, I know my_awful_function() is the problem. Now, which of the 200 lines of spaghetti inside it is actually causing the pain?” This is the universal frustration of function-level profilers. They get you to the crime scene, but not the specific bullet casing. Enter line_profiler. This beautiful tool is the detective that goes in and tells you exactly how many times each line was executed and, crucially, how long each one took on average. It’s the difference between knowing “the engine is broken” and knowing “spark plug #3 is misfiring.” We’re about to perform some open-heart surgery on your code, and this is our MRI machine.

67.2 cProfile and pstats: Function-Level Profiling

Right, so you’ve written some code. It works. You’re feeling pretty good about yourself. But is it fast? Or does it secretly run like a dog walking on its hind legs—technically impressive that it works at all, but you can’t help but wince while watching it? Guessing which part is slow is a fantastic way to be wrong and waste an afternoon. We don’t guess. We measure. And for that, we bring in the heavy artillery: cProfile.

67.1 timeit: Micro-Benchmarking Code Snippets

Right, let’s talk about timeit. You’ve probably had a thought like, “Is method A faster than method B?” and then, like a chump, wrapped it in a time.time() call and run it once. I’ve been there. The results are a lie. Your operating system is a chaotic, beautiful mess of processes fighting for CPU time, and your one-off measurement just captured a moment when a background antivirus scan decided to sigh heavily. We need to do better. timeit is how we do better. It’s the statistical sledgehammer we use to smash uncertainty about tiny, repetitive code.