Skip to content

Conversation

andreaskaris
Copy link
Contributor

@andreaskaris andreaskaris commented Sep 3, 2025

Server-Side Apply

My investigation into Server-Side Apply:
https://andreaskaris.github.io/blog/coding/server-side-apply/

  • Server-Side Apply simplifies controller logic by using a single approach for both creating and updating resources. Instead of checking if an object exists and then choosing between Create or Update operations, controllers can use the Patch method with client.Apply for all cases.

  • This approach works particularly well for reconstructive controllers that want to declare the desired state of resources. The controller builds the complete object definition and lets the Kubernetes API server handle the differences. Field management ensures that conflicts are detected and ownership is tracked properly.

  • https://kubernetes.io/blog/2022/10/20/advanced-server-side-apply/#reconstructive-controllers: Reconstructive controllers: "This kind of controller wasn't really possible prior to SSA. The idea here is to (whenever something changes etc) reconstruct from scratch the fields of the object as the controller wishes them to be, and then apply the change to the server, letting it figure out the result. I now recommend that new controllers start out this way–it's less fiddly to say what you want an object to look like than it is to say how you want it to change. (...) To get around this downside, why not GET the object and only send your apply if the object needs it? Surprisingly, it doesn't help much – a no-op apply is not very much more work for the API server than an extra GET; and an apply that changes things is cheaper than that same apply with a preceding GET. Worse, since it is a distributed system, something could change between your GET and apply, invalidating your computation. Instead, you can use this optimization on an object retrieved from a cache–then it legitimately will reduce load on the system (at the cost of a delay when a change is needed and the cache is a bit behind)"

Get requests are cached by the controller-runtime, so we still benefit from the caching mechanism by running r.Get before and by comparing existing to desired. If existing is not found in the cache, or if the cached version of existing != desired, we build an SSA patch and send that to the server.

We continue comparing to and sending our full intent, meaning the full object, via SSA. Partial updates to resources with Server-Side Apply make little sense in my opinion, as we want to declare the full state of each object and we want to catch any deviations.

Status of Server-Side Apply native support in controller-runtime

Native support of Server-Side Apply is still a work in progress, although it seems that they are nearly done.
Up to controller-runtime 0.21.0, r.Apply() was not available, instead, the recommendation is to use r.Patch:
https://pkg.go.dev/sigs.k8s.io/[email protected]/pkg/client#Client
Therefore, users of controller-runtime would use constructs like:

	if err := r.Patch(ctx, resource, client.Apply, client.ForceOwnership, client.FieldOwner(bpfmanConfig.Name)); err != nil {

A problem with this are the unit tests which do not support Server-Side Apply, and thus an interceptor for Patch requests is needed:

r.Apply() was introduced with controller-runtime v0.22.0 which has not been included in the operator-sdk at time of this writing:

Improved reconciliation logic

I decided to maintain reconciliation logic as is, meaning whenever any object is changed, created, deleted we run through the entire reconciliation logic. As we are using the controller-runtime's cache (which in turn uses k8s informers), load on the API server will be fairly minimal and will be limited to the actual r.Patch requests, only (gets come from cache and we only patch when we need something new or when something deviates).

Each r.Patch against any of the owned resources will always lead to another reconciliation run which checks all owned resources.
Especially on initialization with none of the resources existing (which is the worst case scenario), this leads to several reconciliation runs which might not seem as if they were needed.

  • run 1 creates all objects
{"level":"info","ts":"2025-09-14T13:03:24Z","logger":"Config","msg":"Running the reconciler"}
{"level":"info","ts":"2025-09-14T13:03:24Z","logger":"Config","msg":"Adding finalizer to Config","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:24Z","logger":"Config","msg":"Running the reconciler"}
{"level":"info","ts":"2025-09-14T13:03:24Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:ConfigMap,APIVersion:v1,}","namespace":"bpfman","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:24Z","logger":"Config","msg":"Patching object","type":"&TypeMeta{Kind:ConfigMap,APIVersion:v1,}","namespace":"bpfman","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"csi.bpfman.io","path":"./config/bpfman-deployment/csidriverinfo.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:CSIDriver,APIVersion:storage.k8s.io/v1,}","namespace":"","name":"csi.bpfman.io"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Patching object","type":"&TypeMeta{Kind:CSIDriver,APIVersion:storage.k8s.io/v1,}","namespace":"","name":"csi.bpfman.io"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-daemon","path":"./config/bpfman-deployment/daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-daemon"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Patching object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-daemon"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-metrics-proxy","path":"./config/bpfman-deployment/metrics-proxy-daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-metrics-proxy"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Patching object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-metrics-proxy"}
  • each created object leads to an entire reconcile, again (no-op against the API server)
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Running the reconciler"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:ConfigMap,APIVersion:v1,}","namespace":"bpfman","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"csi.bpfman.io","path":"./config/bpfman-deployment/csidriverinfo.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:CSIDriver,APIVersion:storage.k8s.io/v1,}","namespace":"","name":"csi.bpfman.io"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-daemon","path":"./config/bpfman-deployment/daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-daemon"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-metrics-proxy","path":"./config/bpfman-deployment/metrics-proxy-daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-metrics-proxy"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Running the reconciler"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:ConfigMap,APIVersion:v1,}","namespace":"bpfman","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"csi.bpfman.io","path":"./config/bpfman-deployment/csidriverinfo.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:CSIDriver,APIVersion:storage.k8s.io/v1,}","namespace":"","name":"csi.bpfman.io"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-daemon","path":"./config/bpfman-deployment/daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-daemon"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-metrics-proxy","path":"./config/bpfman-deployment/metrics-proxy-daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-metrics-proxy"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Running the reconciler"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:ConfigMap,APIVersion:v1,}","namespace":"bpfman","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"csi.bpfman.io","path":"./config/bpfman-deployment/csidriverinfo.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:CSIDriver,APIVersion:storage.k8s.io/v1,}","namespace":"","name":"csi.bpfman.io"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-daemon","path":"./config/bpfman-deployment/daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-daemon"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-metrics-proxy","path":"./config/bpfman-deployment/metrics-proxy-daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-metrics-proxy"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Running the reconciler"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:ConfigMap,APIVersion:v1,}","namespace":"bpfman","name":"bpfman-config"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"csi.bpfman.io","path":"./config/bpfman-deployment/csidriverinfo.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:CSIDriver,APIVersion:storage.k8s.io/v1,}","namespace":"","name":"csi.bpfman.io"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-daemon","path":"./config/bpfman-deployment/daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-daemon"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Loading object","object":"bpfman-metrics-proxy","path":"./config/bpfman-deployment/metrics-proxy-daemonset.yaml"}
{"level":"info","ts":"2025-09-14T13:03:25Z","logger":"Config","msg":"Getting object","type":"&TypeMeta{Kind:DaemonSet,APIVersion:apps/v1,}","namespace":"bpfman","name":"bpfman-metrics-proxy"}

I believe the above is the correct approach: Not a whole lot of documentation is available about this, but the Java Operator SDK is pretty adamant about the fact that reconciliation should always reconcile all resources:
https://javaoperatorsdk.io/docs/getting-started/patterns-best-practices/

Implementing a Reconciler
Always Reconcile All Resources

Reconciliation can be triggered by events from multiple sources. It might be tempting to check the events and only reconcile the related resource or subset of resources that the controller manages. However, this is considered an anti-pattern for operators.

Why this is problematic:

    Kubernetes’ distributed nature makes it difficult to ensure all events are received
    If your operator misses some events and doesn’t reconcile the complete state, it might operate with incorrect assumptions about the cluster state
    Always reconcile all resources, regardless of the triggering event
Event Sources and Caching

During reconciliation, best practice is to reconcile all dependent resources managed by the controller. This means comparing the desired state with the actual cluster state.

The Challenge: Reading the actual state directly from the Kubernetes API Server every time would create significant load.

The Solution: Create a watch for dependent resources and cache their latest state using the Informer pattern. In JOSDK, informers are wrapped into EventSource to integrate with the framework’s eventing system via the InformerEventSource class.
Idempotency

Since all resources should be reconciled when your Reconciler is triggered, and reconciliations can be triggered multiple times for any given resource (especially with retry policies), it’s crucial that Reconciler implementations be idempotent.

In the same line of thought, the controller-runtime documentation clearly states that controllers should not distinguish between create/update/delete events: https://github.com/kubernetes-sigs/controller-runtime/blob/main/FAQ.md#q-how-do-i-have-different-logic-in-my-reconciler-for-different-types-of-events-eg-create-update-delete

About owned secondary resources:

@andreaskaris andreaskaris changed the title Config follow-up: Implement Server-Side Apply WIP: Config follow-up: Implement Server-Side Apply Sep 3, 2025
@andreaskaris andreaskaris marked this pull request as draft September 3, 2025 16:42
@andreaskaris andreaskaris force-pushed the reconcile-plus-ssa branch 3 times, most recently from c97aab8 to fa02f77 Compare September 4, 2025 10:21
frobware pushed a commit to frobware/bpfman-operator that referenced this pull request Sep 5, 2025
…s/component-update-ocp-bpfman-agent

chore(deps): update ocp-bpfman-agent to cf30ca8
@andreaskaris andreaskaris force-pushed the reconcile-plus-ssa branch 3 times, most recently from ba3e236 to 4b5edb8 Compare September 14, 2025 13:35
Replace the Get/Create and Get/Update reconciliation pattern with
Server-Side Apply (SSA) using client.Apply patches. This simplifies
resource management by handling both creation and updates in a single
operation while providing better conflict resolution and field
ownership tracking.

Changes:
- Add TypeMeta fields to all Kubernetes objects for proper SSA support
- Change assureResource logic to use r.Patch with client.Apply
- Add test interceptor to handle SSA patches in fake client

Signed-off-by: Andreas Karis <[email protected]>
@andreaskaris andreaskaris marked this pull request as ready for review September 26, 2025 13:13
@andreaskaris andreaskaris changed the title WIP: Config follow-up: Implement Server-Side Apply Config follow-up: Implement Server-Side Apply Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant