Right, so you’ve got your Custom Resource Definition (CRD) YAML open, and you’ve defined your spec and status structs. You’re feeling pretty good. You’re basically an API designer now. But if you just stop there and kubectl apply this thing, you’re signing up for a world of pain. You’ll have a fancy new resource that accepts anything—literally anything—thrown into its spec. A malformed configuration, a string where an integer should be, a nested object that’s seven levels deep for no reason. It’s the Kubernetes equivalent of leaving your front door wide open with a sign that says “Please Make Sensible Choices.”

This is where OpenAPI schema validation comes in. It’s the bouncer for your custom API. It checks the ID at the door and politely (or not so politely) tells invalid manifests to take a hike before they ever get stored in etcd. This isn’t just about being pedantic; it’s about preventing garbage data from breaking your Operator’s delicate logic the moment it tries to fmt.Printf an integer as a string.

The openAPIV3Schema Block: Your First Line of Defense

The magic happens in the versions section of your CRD under schema. This is where you define the rules of the game. The structure is a subset of the OpenAPI v3 spec, which sounds intimidating but is really just a JSON Schema with a fancy hat.

Let’s say we’re defining a CronJob-like resource. Here’s a minimal, but effective, schema to start with:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: scheduledtasks.mycompany.com
spec:
  group: mycompany.com
  names:
    kind: ScheduledTask
    plural: scheduledtasks
    singular: scheduledtask
  scope: Namespaced
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: [command, schedule] # This is crucial
            properties:
              command:
                type: string
                description: The shell command to run.
              schedule:
                type: string
                pattern: '^(@(annually|yearly|monthly|weekly|daily|hourly|reboot))|(@every (\d+(ns|us|µs|ms|s|m|h))+))|((\d+|\*) \d+ \d+ \d+ \d+ \d+)$'
                description: Cron schedule or predefined shorthand.
              restartPolicy:
                type: string
                enum: [Never, OnFailure, Always]
                default: OnFailure
                description: What to do when the command finishes.
    # ... subresources and additional fields omitted for brevity

See what we did there? We’re not just saying spec is an object; we’re defining its contents. The command must be a string. The schedule must be a string and it must match that glorious regex pattern (which, yes, I copied from the Go robronn package because I’m not a masochist). The restartPolicy can only be one of those three values, and if the user doesn’t specify one, we’ll default it to OnFailure for them. This validation happens at the API level, instantly. Try to apply a manifest with restartPolicy: Sometimes, and you’ll get a beautifully immediate error. Thank you, validation bouncer.

The Nullable Trap and Required Fields

Here’s a classic “why is it doing that?!” moment. You define a field as type: string and think you’re done. But then you submit a manifest that explicitly sets that field to null. Surprise! In most JSON Schema implementations, type: string happily allows null. This is a fantastic way to break your Operator’s code that assumes a value will always be a string.

To fix this, you need to be explicit using the nullable flag (a Kubernetes extension to JSON Schema) or by using the oneOf construct. The nullable way is simpler:

someField:
  type: string
  nullable: true # Explicitly allow null values
anotherField:
  type: string
  # nullable is false by default, so nulls are rejected

Also, note the required list in the example above. This is vitally important. Placing a field under properties only defines its type if it’s present; it does not make it mandatory. You must explicitly list it under required to force the user to provide a value. Forgetting this is probably the most common schema mistake. Your Operator will be waiting for a value that never arrives.

Defaults: Convenience with a Caveat

Defaults are fantastic for user experience. They make manifests less verbose. But you must understand how they work: defaults are applied by the Kubernetes API server, not by your Operator. This means when your Operator’s reconciliation loop gets the object, the defaulted field is already there. This is generally what you want.

The critical thing to remember is that a default is just a value plopped in if the user omits the field. It does not validate or transform user input. If a user provides an invalid value, the default won’t save them; the validation will still fail.

The Art of the Upgrade

You’ve shipped v1 of your CRD. Now you need to add a new field, timeoutSeconds. You can’t just edit the CRD in-place. CRDs are versioned entities. You add a new version (v2) to the versions list, mark it as served: true and storage: true (making it the new storage version), and you must define a conversion strategy. For 99% of cases, that’s None because you’re not changing existing stored data, just adding new fields.

versions:
- name: v1
  served: false # Stop serving the old version
  storage: false # Migrate data from this version
  # ... schema for v1
- name: v2
  served: true
  storage: true
  schema:
    openAPIV3Schema:
      type: object
      properties:
        spec:
          type: object
          required: [command, schedule]
          properties:
            # ... all the v1 fields ...
            timeoutSeconds:
              type: integer
              minimum: 1
              maximum: 3600
              default: 60

The key here is that your old v1 objects will be transparently converted to v2 when read by the API server, and the new timeoutSeconds field will be defaulted to 60. This is why planning your schema from the start is worth the effort; making breaking changes is a much bigger headache.

So, be the bouncer. Write a strict schema. Your future self, debugging at 2 a.m., will thank you for it.