37.1 EKS Cluster Creation: eksctl, Terraform, and the Console

Alright, let’s get our hands dirty. Creating an EKS cluster feels like it should be a one-click affair, right? It’s “managed” after all. And then you see the console form with roughly 47 dropdowns and realize, ah, this is AWS’s version of “managed”—they manage the control plane, you manage the configuration headache. Don’t panic. We’ve got three main paths out of this jungle: the AWS Console (for the masochists and the curious), eksctl (for people who value their time), and Terraform (for those of us who need to build something repeatable and robust). I’ll walk you through all three, but I’m not going to pretend they’re all equally admirable.

The Quick and Dirty: `eksctl` from Weaveworks

If you just need a cluster now to test something, eksctl is your best friend. It’s a brilliant CLI tool that wraps the chaotic AWS API into a sane command. It’s so good it almost feels like cheating. You can go from zero to a fully functional cluster in one command. Here’s the classic:

eksctl create cluster --name my-brilliant-cluster --region us-east-1 --nodegroup-name ng-standard --node-type t3.medium --nodes 2

Boom. It creates your CloudFormation stacks, your VPC, your control plane, your node group—everything. But the real magic is in using a config file. Trust me, you will graduate to this immediately. It makes your cluster declarative and version-controllable.

# cluster-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-better-cluster
  region: us-west-2
  version: "1.28"

iam:
  withOIDC: true # <- Crucial for installing things like AWS Load Balancer Controller later

managedNodeGroups:
  - name: managed-ng
    instanceTypes: ["t3.medium"]
    minSize: 2
    maxSize: 5
    desiredCapacity: 2
    volumeSize: 20
    labels: { role: workloads }
    tags: # These propagate to ASG and EC2 instances
      owner: "my-team"
      purpose: "experiment"

Run it with eksctl create cluster -f cluster-config.yaml. The withOIDC: true is non-negotiable if you plan on using IAM roles for service accounts (IRSA), which you absolutely should. It creates an IAM OpenID Connect identity provider for your cluster, which is the secure way to grant AWS permissions to your pods. Forgetting this is a classic pitfall that you’ll only discover hours later when your pod can’t access an S3 bucket.

The Industrial-Strength Choice: Terraform

For anything resembling a real production environment, you’re not manually clicking buttons or running one-off scripts. You’re using Terraform (or maybe CloudFormation, but you seem like a person of taste). The terraform-aws-eks module is the community standard for a reason—it codifies all the best practices and painful-to-write boilerplate.

Why is this better? It ensures your entire AWS footprint—the VPC, the IAM roles, the security groups, the EKS cluster itself—is created in a single, coherent, predictable operation. Here’s a minimal module block. Note how we’re explicitly defining the subnet IDs; you should have already created these in a separate networking module because you’re not a savage.

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "my-terraform-cluster"
  cluster_version = "1.28"

  vpc_id                   = module.vpc.vpc_id # You built this separately, right?
  subnet_ids               = module.vpc.private_subnets # Prefer private subnets for nodes

  cluster_endpoint_public_access = true # Fine for learning, maybe not for prod

  eks_managed_node_groups = {
    default = {
      min_size     = 2
      max_size     = 5
      desired_size = 2

      instance_types = ["t3.medium"]
      capacity_type  = "SPOT" # Be cheap. Spot is fantastic for stateless workloads.
    }
  }
}

# Critical Output: You'll need this to update your kubeconfig
output "cluster_endpoint" {
  description = "Endpoint for your EKS control plane"
  value       = module.eks.cluster_endpoint
}

output "cluster_certificate_authority_data" {
  description = "Base64 encoded CA certificate for the cluster"
  value       = module.eks.cluster_certificate_authority_data
}

After a terraform apply, you’ll need to authenticate. The module outputs the necessary data, but you can also use AWS CLI: aws eks update-kubeconfig --region region-code --name my-terraform-cluster. The Terraform module’s huge advantage is it sets up the OIDC provider and creates all the necessary IAM roles and policies for you, baked right into the process.

The Console UI: For When You Just Want to Feel the Pain

I’m only showing you this so you understand the sheer number of decisions AWS makes you justify. Open the EKS console, click “Add cluster”, then “Create”. You’ll be presented with a form that asks for everything eksctl does for you automatically. You must choose a VPC and subnets (hope you have a sane networking setup already). You must configure proxy settings, logging (which costs extra), and select a Kubernetes version. Then you have to create a node group in a separate, equally convoluted form where you define the AMI, the instance type, the SSH key, the scaling policy, and the tags.

It’s a useful learning exercise to see all the moving parts once. Then never do it again for any real purpose. The margin for error is immense, it’s not reproducible, and it’s painfully slow. The only thing it’s genuinely good for is inspecting existing clusters created by other means.

The Gotchas They Don’t Tell You About

First, cluster version upgrades. You can’t jump versions. You must upgrade one minor version at a time. If you’re on 1.25, you must go to 1.26 before 1.27. Plan for this. eksctl and Terraform can handle it, but it’s not instantaneous.

Second, node groups are not the workers. The nodes themselves are just EC2 instances in an Auto Scaling Group. If you have issues, you need to check the ASG, the EC2 instances, and the EKS node object. Learn the commands kubectl get nodes and kubectl describe node <node-name>.

Finally, networking is everything. The AWS VPC CNI plugin assigns an IP address from your VPC to every pod. This is simple but has a major pitfall: you can run out of IP addresses in your subnet long before you run out of compute capacity. Plan your subnets with large CIDR blocks (like /19) or look into using custom networking to decouple pod IPs from node IPs. It’s a more advanced setup, but it’s the only way to run a large cluster without tearing your hair out.

The Quick and Dirty: eksctl from Weaveworks

The Industrial-Strength Choice: Terraform

The Console UI: For When You Just Want to Feel the Pain

The Gotchas They Don’t Tell You About

The Quick and Dirty: `eksctl` from Weaveworks