27.7 EKS Blueprints: Opinionated Terraform and CDK Modules for EKS

Right, so you’ve decided you want an EKS cluster. Good for you. You’ve also decided you don’t want to spend the next three weeks hand-crafting Terraform or CloudFormation for the VPC, IAM roles, node groups, add-ons, and all the other fiddly bits that AWS requires. You’re smarter than that. This is where EKS Blueprints comes in—it’s a collection of opinionated, pre-packaged modules for Terraform and CDK that aims to get you from zero to a fully-functional, production-ready cluster in a shockingly small amount of code. It’s like a brilliant but stubborn architect who says, “Trust me, I’ve already made all the hard decisions for you.”

27.6 Karpenter: Next-Generation Node Autoscaler for EKS

Alright, let’s talk about Karpenter. Forget everything you thought you knew about autoscaling in Kubernetes, because this thing is a different beast entirely. The old Cluster Autoscaler (CAS) was like trying to parallel park a cruise ship—it worked, eventually, but it was slow, clunky, and you had to pre-define every single parking spot (node group) you might ever need. Karpenter is like teleportation. You say “I need a node with 4 CPUs and 16GB of RAM,” and it materializes the perfect instance for the job, often before the pod scheduler has even finished its cry for help. It’s not just scaling; it’s provisioning, and it does it with terrifying speed and efficiency.

27.5 AWS Load Balancer Controller: ALB and NLB from Kubernetes Ingress and Service

Alright, let’s talk about getting traffic into your EKS cluster. You’ve got your pods running, your services defined, and now you need the outside world to actually see them. You could manually create an Application Load Balancer (ALB) or Network Load Balancer (NLB) in the AWS console every time you need one, but that would be tedious, error-prone, and frankly, a betrayal of the entire GitOps, declarative ethos we’re living in. Enter the AWS Load Balancer Controller (ALB Controller, for short—its name is a bit of a mouthful, as it handles both ALBs and NLBs).

27.4 IAM Roles for Service Accounts (IRSA): Pod-Level IAM Permissions

Right, let’s talk about giving your pods an identity. Because by default, your pods running in EKS have precisely zero IAM permissions. They’re the digital equivalent of a hermit living off-grid—completely isolated from the AWS universe. You could solve this the old, terrible way: grant the massive, terrifying IAM permissions your app needs to the EC2 instance role of the worker node. Then every pod on that node, from your mission-critical app to that random busybox pod you forgot about, inherits those god-like powers. This is a security nightmare waiting to happen, and we’re not doing that.

27.3 EKS Add-Ons: VPC CNI, CoreDNS, kube-proxy, EBS CSI Driver

Right, let’s talk about EKS add-ons. This is where AWS tries to make your life easier by managing some of the core components of your Kubernetes cluster for you. Think of them as the official, blessed-by-AWS versions of things you’d otherwise have to go find, install, and update yourself. It’s a good idea, mostly. We’ll cover the big four: VPC CNI, CoreDNS, kube-proxy, and the EBS CSI Driver. The first thing you need to know is that these aren’t magic. Under the hood, an EKS add-on is essentially AWS using its API to deploy a specific, validated version of a Helm chart or a manifest into your cluster’s kube-system namespace on your behalf. The value isn’t in the initial install—you could do that in five minutes. The value is in the ongoing management. AWS will tell you when new versions are available and handle the (mostly) safe rollout for you. It’s one less thing on your plate.

27.2 Node Groups: Managed Node Groups, Self-Managed, and Fargate Profiles

Alright, let’s talk about the actual compute in your cluster: the nodes. In EKS, you’ve got three main flavors for getting your worker nodes running: Managed Node Groups (MNGs), self-managed nodes (usually via the aws-iam-authenticator and some CloudFormation voodoo), and the serverless oddball, Fargate. Each has a superpower and a corresponding kryptonite. Your job is to pick which trade-off you want to live with. Managed Node Groups: The Easy Button (Mostly) This is AWS saying, “Look, you have enough to worry about. Let me handle the grimy details of the EC2 instances for you.” And 90% of the time, you should listen. An MNG isn’t just an Auto Scaling Group (ASG) that EKS knows about; it’s a tightly integrated abstraction that handles a ton of boilerplate for you.

27.1 EKS Control Plane: Managed API Server and etcd

Right, let’s talk about the brain of your EKS cluster: the control plane. When you hear “managed,” your brain might conjure images of AWS handling all the tedious bits while you kick back. And for the most part, that’s true. But “managed” doesn’t mean “magic.” It means “we run the fiddly bits you probably don’t want to, and you still need to know how they work so you don’t accidentally set the whole thing on fire.”

39.7 KEDA on AKS: Event-Driven Scaling with Azure Services

Right, so you’ve got your AKS cluster humming along. You’ve probably set up the Horizontal Pod Autoscaler (HPA) to scale based on CPU or memory, and you’re feeling pretty good about yourself. And you should. But let’s be honest: most of the interesting stuff that happens in the cloud isn’t a slow, steady trickle of CPU load. It’s a sudden, screaming torrent of events. A million messages piling up in a Service Bus queue. A thousand new blobs dropped in a Storage account. A massive backlog in Azure Event Hubs. Your statically-provisioned pods are just sitting there, blissfully unaware of the incoming tidal wave. This is where KEDA, the Kubernetes Event-Driven Autoscaler, comes in to save the day. It’s the nervous system that connects your pod scale to the actual work that needs to be done.

39.6 AKS Add-Ons: Monitoring, Policy, and Ingress

Right, let’s talk about AKS add-ons. This is where Azure tries to save you from yourself, or at least from the sheer drudgery of wiring up the same open-source projects for the thousandth time. The idea is simple: click a checkbox (or flip a --enable-whatever flag) and Azure will install, configure, and manage a core component on your cluster for you. It’s tempting to just enable everything. Don’t. Be strategic. Some are brilliant time-savers; others… well, let’s just say you might want to bring your own.

39.5 Azure Disk and Azure Files CSI Drivers

Right, let’s talk about getting data into your pods. You’ve built this brilliant, ephemeral, self-healing microservice. Fantastic. Now it needs to read a config file. Or write a log. Or store a user’s uploaded picture of their cat. Suddenly, your perfectly stateless world needs state. This is where the Container Storage Interface (CSI) drivers come in—they’re the well-mannered bouncers that translate your pod’s polite request for storage into the specific API calls Azure understands.

39.4 Azure CNI and Kubenet Networking

Right, let’s talk networking. This is where the rubber meets the road in Kubernetes, and where Azure’s “managed” service starts to feel a lot more… hands-on. You have two primary choices here: the Azure-native kubenet and the more advanced, integrated Azure CNI. Your choice isn’t just about IP addresses; it’s a fundamental decision about how tightly you want your cluster woven into the fabric of your Azure Virtual Network (VNet). Choose poorly, and you’ll be dealing with a special kind of IP address hell. Let’s get into it.

39.3 Azure Active Directory Integration and Managed Identities

Right, let’s talk about identity. It’s the single most important and, let’s be honest, most frequently botched part of any cloud deployment. You can have the most beautifully architected app, but if it can’t talk to its database, it’s just a very expensive error message. In the old days, we’d be slinging secrets and connection strings into environment variables like we were throwing confetti. Don’t do that. It’s 2024, and we have better ways. Specifically, for your AKS cluster, we have Azure Active Directory (AAD) integration and the absolute game-changer that is Managed Identities.

39.2 Node Pools and Virtual Machine Scale Sets

Right, let’s talk about the actual workers in your AKS cluster. You’ve got your control plane managed by Azure (which is a blessing, trust me), but the nodes—the VMs where your pods actually run—are your responsibility. And in AKS, you don’t manage individual nodes; you manage node pools, which are backed by the real hero (or sometimes villain) of Azure compute: Virtual Machine Scale Sets (VMSS). Think of a node pool as a group of identical worker bees. They all have the same CPU, memory, OS, and often, the same labels and taints. You define the hive, and Azure scales the number of bees for you. Under the hood, this hive is a VMSS. It’s the Azure infrastructure service that allows you to create and manage a group of identical, load-balanced VMs. The AKS team chose VMSS because it’s the native way to get fast, reliable scaling and automated repairs. Trying to do this with individual VMs would be a nightmare they wisely decided to spare you.

39.1 AKS Cluster Creation with az CLI and Terraform

Alright, let’s get our hands dirty. You’re about to create an AKS cluster, which is essentially you renting a fully-managed Kubernetes control plane from Microsoft. The magic here is that they handle the API server, scheduler, etcd, and all those other finicky control plane components that you really don’t want to get a 3 AM page about. You just manage the worker nodes. It’s a fantastic division of labor. Now, you’ve got two primary paths to make this happen: the quick-and-dirty az cli for when you need to test something now, and the sober, responsible Terraform path for when you need something repeatable, version-controlled, and actually sane. We’ll do both. Strap in.

38.7 GKE Autopilot: Fully Managed Node Infrastructure

Alright, let’s talk about GKE Autopilot. You’ve dipped your toes into standard GKE, you’ve provisioned your node pools, and you’ve probably spent a non-zero amount of time staring at kubectl top nodes wondering if you’ve allocated enough CPU to your coredns pods. Autopilot is Google’s answer to that particular flavor of existential dread. It’s their “fully managed” node infrastructure mode, which is a fancy way of saying: “You handle the pods, we’ll handle the boring, expensive, and complex part—the actual VMs they run on.”

38.6 GKE Persistent Disk and Filestore CSI Drivers

Right, let’s talk about storage. It’s the part of Kubernetes everyone secretly dreads. You can’t just kubectl scale your database (well, you can, but you really, really shouldn’t). Your applications, however, have needs. They need to remember things. For that, we turn to the unsung heroes of the GKE world: the CSI drivers. Specifically, the ones for Persistent Disk (your go-to block storage) and Filestore (a managed NFS service for when you need a shared filesystem). Google was kind enough to build these for us and ship them as a default, integrated part of GKE. This is a massive win, because if you’ve ever had to manage a CSI driver yourself, you know it’s about as fun as debugging a YAML indentation error at 2 AM.

38.5 Cloud Load Balancing Integration

Right, let’s talk about getting traffic into your cluster. You’ve built this brilliant, distributed application, and now you need to show it to the world. This is where GKE’s integration with Google Cloud’s Load Balancers goes from “nice-to-have” to “why would you ever do it any other way?” The magic here is that GKE doesn’t just work with Cloud Load Balancing; it automates it. You don’t manually create load balancers, health checks, or backend services in the Google Cloud console. You declare what you want—a public HTTP(S) service, an internal service, SSL offloading—and GKE talks to Google’s control plane to build the real, global infrastructure for you. It’s like having a brilliant, hyper-competent intern who you just tell “make this app available” and they handle the 47-step checklist without bothering you.

38.4 GKE Networking: VPC-Native Clusters and Alias IPs

Right, let’s talk networking. This is where most people’s eyes glaze over, but stick with me—it’s also where you’ll solve your most baffling problems and prevent your future self from sending angry emails to past you. GKE’s networking model, specifically the “VPC-native” bit, is one of those things Google got genuinely right. It saves you from a world of self-inflicted pain. The old way, which GKE tellingly calls “routes-based,” was a bit of a kludge. It worked by programmatically creating a Google Cloud route for every single Pod in your cluster. You’d spin up a 500-node cluster, and suddenly your project had thousands of routes. It was a management nightmare, slow to propagate, and, crucially, hit hard quotas. It was absurd. Thankfully, it’s now deprecated and you should never, ever use it.

38.3 Workload Identity: Linking Kubernetes Service Accounts to GCP IAM

Right, let’s talk about Workload Identity. This is, without a doubt, the single most important security feature you’ll configure on GKE. It solves a problem that used to be a total nightmare: how do you give your Pods access to other Google Cloud services—like a Cloud Storage bucket or a BigQuery dataset—without being a complete maniac? The old way was to either: a) download a JSON service account key, bake it into a Kubernetes Secret, and pray to the ops gods it never leaked (it always did), or b) give the node pool’s service account absurdly broad permissions, effectively turning every Pod on your node into a privileged user. Both options are terrible. The first is a key management disaster, and the second is like giving every person in a building the master key to the city. Google rightfully decided this was clown shoes and built a better way.

38.2 Node Pools: Spot VMs, GPU Nodes, and Preemptible Instances

Alright, let’s talk about the real workhorses of your GKE cluster: the node pools. Think of your cluster as a nightclub; the control plane is the bouncer and manager, but the node pools are the actual dance floors where your pods (the patrons) get down to business. You don’t want just one type of dance floor. You need a VIP section, a cheap area for the rowdy crowd, and maybe a special room with fancy equipment. That’s what node pools are for.

38.1 GKE Autopilot vs Standard Mode

Alright, let’s settle this. You’re standing at the GKE console, about to create a cluster, and you’re hit with the first big choice: Standard or Autopilot? This isn’t just a checkbox; it’s a fundamental decision about who’s driving the bus—you or Google. Let’s break it down without the marketing fluff. The Core Philosophical Divide Think of GKE Standard as a powerful company car. They hand you the keys, a full tank of gas, and say, “Have fun!” You’re responsible for driving it, maintaining it, and paying for the gas you use, whether you drive 100 miles or let it idle in the garage all week. You have near-total control, for better and worse.

37.8 EKS Cost Optimization: Spot Instances and Karpenter

Right, let’s talk about saving money. Because let’s be honest, the only thing more terrifying than your Kubernetes cluster melting down is the bill for the cluster that’s sitting there doing nothing. AWS is happy to sell you on-demand instances that you pay for 24/7, but we’re smarter than that. We’re going to harness two of AWS’s most powerful cost-saving tools: Spot Instances and Karpenter. One is a deeply discounted fire sale on compute capacity, and the other is the brilliant, ruthless robot that knows how to shop it.

37.7 EKS Add-Ons: CoreDNS, kube-proxy, Amazon VPC CNI

Right, let’s talk about the three amigos that AWS graciously pre-installs for you on every EKS cluster: CoreDNS, kube-proxy, and the Amazon VPC CNI. Think of them less as optional “add-ons” and more as the “operating system” of your cluster. Without them, your cluster is a very expensive, very confused computer that can’t talk to itself or the outside world. AWS manages the installation and versioning of these for you, which is mostly a blessing, but as we’ll see, sometimes a curse in disguise.

37.6 EKS Networking: VPC CNI and Security Groups for Pods

Alright, let’s talk networking. This is where the rubber meets the road in EKS, and frankly, where most people get their knickers in a twist. You’ve got your shiny new cluster, but until your pods can actually talk to each other and the outside world, it’s just a very expensive, very abstract art installation. The two big players here are the VPC CNI plugin and Security Groups for Pods. One provides the fundamental plumbing, the other gives you a much-needed security scalpel. Let’s get our hands dirty.

37.5 EBS and EFS CSI Drivers for Persistent Storage

Right, let’s talk storage. Because your fancy pods are ephemeral, and while that’s great for cattle, not pets, your precious application data needs to live somewhere more permanent than a container’s short, brutal life. You can’t just chmod 777 your way out of this one. In the old, barbaric days of EKS, you’d use the in-tree aws-ebs and aws-efs volume plugins that were baked into Kubernetes itself. Those are now deprecated and scheduled for a not-so-tearful goodbye. The future, and frankly the present, is the Container Storage Interface (CSI).

37.4 AWS Load Balancer Controller and ALB Integration

Alright, let’s talk about getting traffic into your EKS cluster. You’ve got pods, they’re running your brilliant application, but they’re useless if users can’t reach them. You might be thinking, “It’s Kubernetes, I’ll just create a Service of type LoadBalancer and call it a day.” And you’d be right… sort of. On AWS, that classic move doesn’t get you a classic Elastic Load Balancer (ELB) by default. It gets you a Network Load Balancer (NLB). And while NLBs are fantastic for raw performance and preserving the client IP, they’re a bit like a sledgehammer—powerful but not always the right tool for the job, especially for HTTP-based services.

37.3 IAM Roles for Service Accounts (IRSA)

Alright, let’s talk about IAM Roles for Service Accounts, or IRSA. This is, without a doubt, one of the best things to happen to Kubernetes on AWS. Before IRSA, giving a pod permissions to, say, access an S3 bucket was a bit of a nightmare. You’d have to give the EC2 instance running your worker nodes a massive IAM role with all the permissions any pod on that node could ever need. It was the equivalent of handing out the master key to the entire building to every single tenant. Horrifying from a security perspective, and a compliance auditor’s worst nightmare.

37.2 EKS Node Groups: Managed, Self-Managed, and Fargate

Alright, let’s talk about the actual compute power in your EKS cluster: the nodes. This is where your pods actually run, and AWS gives you three distinct flavors to choose from. Picking the right one isn’t just a technicality; it’s the difference between a smooth ride and a part-time job you never applied for. Managed Node Groups: Your Default Choice If you’re not a masochist, start here. An EKS Managed Node Group (MNG) is AWS saying, “Hey, we’ll handle the Kubernetes worker node boilerplate for you.” They provision the underlying EC2 instances, register them with your cluster, and—this is the killer feature—manage the node lifecycle, including automated rolling updates and terminations.

37.1 EKS Cluster Creation: eksctl, Terraform, and the Console

Alright, let’s get our hands dirty. Creating an EKS cluster feels like it should be a one-click affair, right? It’s “managed” after all. And then you see the console form with roughly 47 dropdowns and realize, ah, this is AWS’s version of “managed”—they manage the control plane, you manage the configuration headache. Don’t panic. We’ve got three main paths out of this jungle: the AWS Console (for the masochists and the curious), eksctl (for people who value their time), and Terraform (for those of us who need to build something repeatable and robust). I’ll walk you through all three, but I’m not going to pretend they’re all equally admirable.

— joke —

...