39.4 Huge Pages: Transparent Huge Pages and Static Huge Pages

Alright, let’s talk about huge pages. You know how your CPU’s memory management unit (MMU) has a little book of addresses called the Translation Lookaside Buffer (TLB)? Think of it as the CPU’s favorite contacts list. Every time it needs to access memory, it has to look up the virtual address in this list to find the real physical address. The problem? This list is tiny. Like, “forgets more than three items” tiny.

Normally, it deals with memory in 4KiB chunks (pages). If you’re working with a massive chunk of memory—say, a multi-gigabyte in-memory database—the poor TLB has to constantly thrash, looking up new addresses. It’s like trying to find a specific word in a massive encyclopedia by only being able to index one page at a time. It’s a performance nightmare.

Huge pages are the obvious solution: instead of 4KiB pages, we use 2MiB or even 1GiB pages. Now, a single TLB entry can cover a massive swath of memory. The TLB can breathe easy, and your application’s memory access latency drops. It’s a free lunch, right? Well, mostly. The kitchen (the OS) has to prepare it, and there are two ways to order: you can call ahead and reserve them (static), or you can hope the chef is feeling proactive and makes them on the fly (transparent). Let’s break down both.

Static (Explicit) Huge Pages

This is the old-school, brute-force method. You tell the OS at boot time, “Hey, I want 1000 of your finest 2MiB huge pages ready at all times.” The kernel then carves out that contiguous, physically-addressed chunk of memory (2GiB in this case) and sets it aside. Nothing else can use it. It’s your private, reserved parking garage.

You configure this via the kernel command line or in /etc/sysctl.conf:

# Add this to your kernel boot parameters in GRUB, or set it via sysctl
vm.nr_hugepages=1024

Now, your application has to explicitly ask for them. This is typically done by mmaping the hugetlbfs filesystem. Yes, it’s a bit weird—you’re “opening a file” that’s actually a gateway to this reserved huge page memory.

#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>

#define HPAGE_SIZE (2 * 1024 * 1024) // 2MiB

int main() {
    int fd;
    void *addr;

    // Open the hugetlbfs virtual filesystem
    fd = open("/dev/hugepages/tutorial", O_CREAT | O_RDWR, 0755);
    if (fd < 0) {
        perror("Open failed");
        return 1;
    }

    // Map one huge page
    addr = mmap(NULL, HPAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
    if (addr == MAP_FAILED) {
        perror("mmap failed");
        return 1;
    }

    printf("Huge page mapped at %p\n", addr);

    // Use your fantastically fast memory...
    sprintf((char*)addr, "Hello from a huge page!");

    // Cleanup
    munmap(addr, HPAGE_SIZE);
    close(fd);
    unlink("/dev/hugepages/tutorial");
    return 0;
}

The Pitfall: The “reserved” part is the killer. If you allocate 1024 huge pages but your app only uses 10, you’ve permanently wasted 2GiB of RAM on nothing. It’s inflexible and requires foresight and root access. This is why the kernel developers thought, “There has to be a better way.”

Transparent Huge Pages (THP)

Enter THP, the kernel’s attempt to be helpful. The idea is brilliant: the kernel automatically tries to coalesce adjacent 4KiB pages into a huge page behind your back. Your application doesn’t need to change a single line of code. It just gets the performance boost. Magic!

You can check and control its behavior via /sys/kernel/mm/transparent_hugepage/enabled:

cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never

# To enable it system-wide (the default on most distros)
echo always | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

# Or, to be more conservative and only use it when an app explicitly asks via madvise
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

You can even suggest to the kernel that a specific memory range is a good candidate:

// ... after a normal mmap or malloc...
if (madvise(addr, length, MADV_HUGEPAGE) == -1) {
    // handle error
}

The Gotcha: Ah, but the kernel is not a perfect chef. This magic has a cost. The process of “khugepaged” trying to defragment memory to create these huge pages can add latency and CPU overhead at unpredictable times. I’ve seen it cause noticeable performance jitter on latency-sensitive workloads. The worst part? If a single 4KiB page in a potential 2MiB region is pinned or can’t be moved, the whole promotion fails. The kernel’s eagerness can also lead to memory bloating—allocating a tiny amount of memory might inadvertently commit a full 2MiB.

So, Which One Should You Use?

Here’s the brutal truth, straight from the trenches:

Use Static Huge Pages for predictable, high-performance, dedicated workloads. Databases like Oracle and PostgreSQL often recommend this. You know exactly how much memory they’ll need, you want guaranteed zero-latency access, and you can’t afford the kernel’s background shenanigans. You pay the overhead of complexity and wasted RAM up front.
Use Transparent Huge Pages for general-purpose systems or applications where you can’t be bothered to micromanage memory. It’s “good enough” for most things and provides a decent boost for free. But for anything where consistent microsecond-level latency is critical, test with it disabled. The jitter is real.
The Pro Move: Use madvise mode and explicitly tag your critical memory regions with MADV_HUGEPAGE, giving the kernel a strong hint without letting it run wild over your entire address space. This gives you most of the benefit with much less of the unpredictability.

Neither solution is perfect because memory management is a fundamentally hard problem. Static is a sledgehammer; transparent is a helpful but occasionally overzealous intern. Your job is to choose the right tool for the job, and now you know enough to make that call.