25.5 Operator SDK: Scaffold, Build, and Deploy Operators
Right, so you’ve decided to build an Operator. Welcome to the club. We’ve moved past the philosophical debate of “should you?” and landed squarely in the “how the hell do you?” phase. The Operator SDK is your power tool for this job. It doesn’t just generate a skeleton; it gives you a full-blown exoskeleton, complete with best practices and structure baked in. Think of it as scaffold new-project but for the incredibly specific world of Kubernetes automation. Let’s get our hands dirty.
Initializing Your Operator Project
First things first, you need the Operator SDK CLI. I’ll assume you’ve go install-ed it and are ready to go. We’re going to initialize a new project for an operator that manages a fictional resource, a Blorg. Don’t ask what a Blorg does; it’s a brilliant example and that’s all that matters.
The init command sets up the entire foundation. Notice we’re using the go plugin. While the SDK supports Ansible and Helm, the Go operator gives you the most power and, consequently, the most rope with which to hang yourself. We choose it deliberately.
mkdir blorg-operator && cd blorg-operator
operator-sdk init --domain example.com --repo github.com/example/blorg-operator --project-version=3
This command does a ton of heavy lifting. It creates the go.mod file, sets up the basic directory structure (cmd/, apis/, controllers/), and, crucially, generates the main.go file that will be the entry point for your Operator’s manager. The --domain is vital; it’s the group part of your future API group (e.g., blorgs.example.com). The --project-version=3 is non-negotiable. The v3 project layout is a significant, sanity-preserving improvement over previous versions. Trust me, you want this.
Creating Your First API (and its CRD)
Now, the main event: defining your custom resource. This is where you stop being a Kubernetes user and start being a Kubernetes extender. The create api command is a two-for-one deal: it scaffolds the Custom Resource Definition (the what) and the Controller (the how).
operator-sdk create api --group blorg --version v1alpha1 --kind Blorg --resource=true --controller=true
Let’s break down those flags. --group and --version form the API group (blorgs.example.com/v1alpha1). You start with v1alpha1 because this is experimental. When you’re stable and have a backwards-compatible API, you’ll promote it to v1beta1 and eventually v1. --kind is the name of your resource, like Pod or Deployment. --resource and --controller being true means “generate the code for both, please.”
This command generates two critical files:
api/v1alpha1/blorg_types.go: This is where you define the spec (the desired state users provide) and the status (the observed state your operator writes). It’s your data model.controllers/blorg_controller.go: This is where you write the business logic—the reconciliation loop that makes the actual state match the desired state.
You must edit blorg_types.go immediately after generation. The tool puts in placeholder structs. Let’s define a halfway useful BlorgSpec.
// api/v1alpha1/blorg_types.go
type BlorgSpec struct {
// The number of desired replicas for the Blorg's deployment.
// This is a classic example; your operator will create a Deployment to run something.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10
ReplicaCount int32 `json:"replicaCount"`
// The image for the container to run. Please don't use latest. I'm trusting you.
Image string `json:"image"`
// A delightful message the Blorg will display. Because every demo needs a string field.
Message string `json:"message,omitempty"`
}
type BlorgStatus struct {
// The actual number of replicas running.
AvailableReplicas int32 `json:"availableReplicas"`
// A simple status condition.
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
After you edit the types, you must run make manifests. This command uses the magic of the controller-gen tool to read the Go types and the // +kubebuilder comments to generate the actual CRD YAML in config/crd/bases/. Those comments are how you add validation, set defaults, and define additional pruning behavior. It’s a brilliant piece of design that keeps your API definition and documentation in a single place.
The Reconciliation Loop: Where the Magic Happens
Open controllers/blorg_controller.go. Your life now revolves around the Reconcile method. This function is called every time a Blorg resource is created, updated, or deleted. It’s also called if any of the objects you own (like your Deployment) change. Your job here is to look at the current state of the world and then make changes to drive it toward the desired state defined in the Blorg’s .spec.
The SDK gives you a basic skeleton, but a realistic loop for our Blorg would look something like this:
func (r *BlorgReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
log.Info("Reconciling Blorg", "blorg", req.NamespacedName)
// 1. Fetch the Blorg instance
blorg := &blorgv1alpha1.Blorg{}
if err := r.Get(ctx, req.NamespacedName, blorg); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Check if the Blorg is being deleted
if !blorg.DeletionTimestamp.IsZero() {
// Handle cleanup here
return ctrl.Result{}, nil
}
// 3. Create or Update the Deployment
deployment := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: blorg.Name, Namespace: blorg.Namespace}, deployment)
if err != nil && apierrors.IsNotFound(err) {
// Doesn't exist? Create it.
log.Info("Creating Deployment")
dep := r.createDeploymentForBlorg(blorg)
if err := r.Create(ctx, dep); err != nil {
return ctrl.Result{}, err
}
// We'll need to re-reconcile after creation
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
return ctrl.Result{}, err
}
// Deployment exists, check if it needs an update (e.g., replica count or image changed)
desiredReplicas := blorg.Spec.ReplicaCount
if *deployment.Spec.Replicas != desiredReplicas {
log.Info("Updating Deployment replica count", "old", *deployment.Spec.Replicas, "new", desiredReplicas)
deployment.Spec.Replicas = &desiredReplicas
if err := r.Update(ctx, deployment); err != nil {
return ctrl.Result{}, err
}
}
// 4. Update the Blorg's status
blorg.Status.AvailableReplicas = deployment.Status.AvailableReplicas
if err := r.Status().Update(ctx, blorg); err != nil {
log.Error(err, "Failed to update Blorg status")
return ctrl.Result{}, err
}
log.Info("Reconciliation complete")
return ctrl.Result{}, nil
}
The key pitfall here, the one that gets everyone, is forgetting to set the controller reference. You must mark the Deployment as “owned” by the Blorg. This is what tells Kubernetes, “if this Blorg disappears, please garbage collect this Deployment.” You do this in your createDeploymentForBlorg function:
func (r *BlorgReconciler) createDeploymentForBlorg(b *blorgv1alpha1.Blorg) *appsv1.Deployment {
dep := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: b.Name,
Namespace: b.Namespace,
// This is the crucial part!
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(b, blorgv1alpha1.GroupVersion.WithKind("Blorg")),
},
},
Spec: ...,
}
return dep
}
If you forget this, deleting your Blorg will do nothing, leaving orphaned Deployments everywhere. Your operator will be a bad citizen, and we will all frown disapprovingly.
Building, Deploying, and Testing
With the code written, it’s time to build and deploy. The SDK’s Makefile is your best friend.
# Build the container image
make docker-build docker-push IMG="example.com/blorg-operator:v0.0.1"
# Generate the deployment manifests and deploy them
make deploy IMG="example.com/blorg-operator:v0.0.1"
This deploy target is fantastic. It applies the CRDs, RBAC roles/bindings, and the Deployment for your operator itself. All the boilerplate you’d dread writing by hand is done for you.
Now, create a sample Blorg to see it in action:
# config/samples/blorg_v1alpha1_blorg.yaml
apiVersion: blorg.example.com/v1alpha1
kind: Blorg
metadata:
name: blorg-sample
spec:
replicaCount: 2
image: nginx:1.25
message: "My first Blorg!"
Apply it with kubectl apply -f config/samples/. Watch the magic happen. Run kubectl get deployments and you should see your blorg-sample deployment with 2 replicas. This is the payoff. You’ve just automated a Kubernetes primitive.
The final, critical piece of advice: write tests. The SDK scaffolds integration tests (make test) that spin up a real envtest Kubernetes API server. It’s not a full cluster, but it’s enough to test your reconciliation logic against the real API. Testing operators is notoriously tricky, and this tooling is a gift. Use it. Your future self, debugging at 2 a.m., will thank you.