25.7 controller-runtime: The Library Behind Kubebuilder
Right, so you’ve got your CRD defined. It’s a beautiful spec, full of hope and YAML. But it just sits there. It’s like building a car with no engine. The CRD is the chassis and body; the controller is the engine that makes it go. And when it comes to building controllers in Go, controller-runtime is the machine shop where you forge that engine. It’s the core library that Kubebuilder and the Operator SDK use to do the heavy lifting. Think of Kubebuilder as the fancy, pre-fab garage—it gets you started fast. But to really understand what’s going on under the hood, you need to know controller-runtime.
The Core Loop: It’s All About Reconciling
At its heart, a controller is a simple, obsessive-compulsive loop. It watches for changes to a specific kind of object and then, for each change, it runs a function called the Reconcile Loop. The goal of this loop is singular: look at the current state of the world and then do whatever it takes to make it match the desired state you described in your custom resource.
The desired state is in your MyApp resource’s .spec field. The current state is… everything else in the cluster. Your job in the Reconcile function is to bridge that gap. Did the user just create a MyApp? I should create a Deployment and a Service. Did they update the image field? I should roll out that new image. Did someone manually delete my Deployment? I should notice it’s missing and recreate it. This loop runs constantly, which is why it has to be written to be idempotent. Running it five times with the same input should have the exact same effect as running it once.
Here’s the skeleton of a Reconcile function. This is the boilerplate you’ll write a thousand times.
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// 1. Fetch the custom resource instance that triggered this reconciliation.
myApp := &myapiv1alpha1.MyApp{}
if err := r.Get(ctx, req.NamespacedName, myApp); err != nil {
// The resource might have been deleted, so we'll ignore "not found" errors.
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Your logic goes here! For example, manage a Deployment...
found := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: myApp.Name, Namespace: myApp.Namespace}, found)
if err != nil && apierrors.IsNotFound(err) {
// It doesn't exist? Great, let's create it.
dep := r.constructDeploymentForMyApp(myApp)
if err := r.Create(ctx, dep); err != nil {
return ctrl.Result{}, err
}
// We created something, probably best to requeue to check it worked.
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
// Some other error? Yikes, log it and requeue.
return ctrl.Result{}, err
}
// 3. Check if the Deployment spec matches what we want. If not, update it.
if !r.checkDeploymentSpec(found, myApp) {
updatedDep := r.constructDeploymentForMyApp(myApp)
found.Spec = updatedDep.Spec
if err := r.Update(ctx, found); err != nil {
return ctrl.Result{}, err
}
}
// 4. If all is well, we return an empty result with no error.
// The reconciler will come back if something changes, thanks to our Watches.
return ctrl.Result{}, nil
}
Setting Up Watches: Telling the Controller What to Care About
You don’t want your Reconcile function called for every Pod in the cluster. That would be chaos. You tell controller-runtime exactly which object types to watch using the SetupWithManager function. This is where the magic of filtering happens. The most important watch is, obviously, for your custom resource. But here’s the real power: you can also watch for changes to owned objects, like the Deployment you created.
func (r *MyAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&myapiv1alpha1.MyApp{}). // Watch for changes to MyApp
Owns(&appsv1.Deployment{}). // And if any Deployment I own changes, requeue its owner MyApp
Complete(r)
}
This Owns watch is a masterpiece of design. If someone updates the Pod template in your Deployment, the controller automatically figures out which MyApp owns that Deployment and queues a reconciliation for that MyApp. This is how it self-heals. You get this for free. Cherish it.
The Client and Caching: The Wizard Behind the Curtain
When you call r.Get or r.List, you’re not making a direct API call to the Kubernetes API server every single time. That would be incredibly slow and would crush the API server under the load of a few controllers. Instead, controller-runtime uses a informer cache. It maintains a local, eventually-consistent copy of the objects it’s watching.
This is why you sometimes have to Requeue after a create operation—the cache might not be instantly updated with the object you just created, so trying to fetch it immediately could fail. The cache is the reason this whole pattern is performant, but it also introduces a slight lag between an action happening and the controller seeing it. It’s a trade-off you must be aware of.
Common Pitfalls: Where Everyone Gets Stuck
The Update Pitfall: You
Getan object, modify it, and thenUpdateit. But what if someone changed the object after you did theGet? YourUpdatewill fail with a conflict because theresourceVersiondoesn’t match. The robust way is to use a retry loop:Get, modify, try toUpdate, and if it fails, re-Getand try again. For the most common case,controller-runtimeprovides thepatchAPI which can often avoid this.Requeueing Too Much: It’s tempting to
return ctrl.Result{Requeue: true}, nilall the time. Don’t. If you have a permanent error (like invalid user input in the.spec), requeuing won’t fix it and will just spam your logs. Log the error and return without requeuing. Only requeue for transient errors (like network flakiness) or when you’ve explicitly created a resource and need to check on it next cycle.Not Using Finalizers: If your controller creates objects outside of the Kubernetes cluster (e.g., a cloud load balancer, a database user), you absolutely must use a finalizer. When someone deletes your
MyAppresource, Kubernetes only knows to delete the Deployments and Services inside the cluster. The finalizer blocks deletion until your controller has had a chance to run its cleanup logic for those external resources. Forgetting this leaves orphaned resources lying around, and I will personally be very disappointed in you.