Skip to content

Conversation

@fedepaol
Copy link
Member

  • Adds status cleaner job that acts as webhook backend plus takes care of removing frrstatus instances related to those nodes where an frrk8s instance is not running anymore
  • Adds readOnlyFileSystem=true and allowPrivilegeEscalation=false to the frrk8s controller container

@fedepaol
Copy link
Member Author

/testwith openshift/frr#107

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

Added two emptyDir volumes and mounts for FRR and hardened both frr and controller containers. Introduced a new frr-k8s-statuscleaner Deployment (webhook-only mode, TLS from secret, probes, hostNetwork). Removed the previous webhook Deployment and repointed the webhook Service to the statuscleaner.

Changes

Cohort / File(s) Summary
FRR DaemonSet: volumes & security
bindata/network/frr-k8s/frr-k8s.yaml
Added two emptyDir volumes (frr-lib, frr-tmp) and mounted them into the frr and controller containers at /var/lib/frr and /var/tmp/frr. Set allowPrivilegeEscalation: false and readOnlyRootFilesystem: true on both containers.
New status cleaner Deployment
bindata/network/frr-k8s/node-status-cleaner.yaml
Added Deployment frr-k8s-statuscleaner in openshift-frr-k8s running /statuscleaner with args including --webhook-mode=onlywebhook and --webhook-port=9123. Uses {{.FRRK8sImage}}, sets NAMESPACE via downward API, mounts secret frr-k8s-webhook-server-cert at /tmp/k8s-webhook-server/serving-certs, exposes port 9123, includes HTTPS liveness/readiness probes, hostNetwork: true, tolerations for control-plane, priorityClassName: system-cluster-critical, serviceAccountName: frr-k8s-daemon, and terminationGracePeriodSeconds: 10.
Webhook service & Deployment removal
bindata/network/frr-k8s/webhook.yaml
Removed apps/v1 Deployment frr-k8s-webhook-server. Updated Service frr-k8s-webhook-service selector from component: frr-k8s-webhook-server to component: frr-k8s-statuscleaner.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Check that the Service selector update and Deployment removal leave no dangling references.
  • Verify secret mount and TLS/probe configuration for the statuscleaner (port 9123, cert path).
  • Confirm FRR and controller containers function with readOnlyRootFilesystem: true and required writable paths provided by the new emptyDir mounts.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ae4e0e0 and 5872e43.

📒 Files selected for processing (3)
  • bindata/network/frr-k8s/frr-k8s.yaml (3 hunks)
  • bindata/network/frr-k8s/node-status-cleaner.yaml (1 hunks)
  • bindata/network/frr-k8s/webhook.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • bindata/network/frr-k8s/node-status-cleaner.yaml
  • bindata/network/frr-k8s/frr-k8s.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/network/frr-k8s/webhook.yaml
🔇 Additional comments (2)
bindata/network/frr-k8s/webhook.yaml (2)

16-43: ValidatingWebhookConfiguration correctly references service.

The webhook configuration correctly continues to reference frr-k8s-webhook-service. Since the service is being repurposed to route to the new statuscleaner deployment, this indirection is the right design pattern—the webhook configuration doesn't need changes.


14-14: Service selector change verified as correct.

The statuscleaner Deployment (node-status-cleaner.yaml) provides the required component: frr-k8s-statuscleaner label at the pod template level (line 18) and exposes port webhook (line 46). The Service annotation referencing frr-k8s-webhook-server-cert is consistent with the secret name mounted by the statuscleaner container (line 87 of node-status-cleaner.yaml). The webhook configuration is properly aligned with no issues identified.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fedepaol

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 05d6f46 and 292eb0c.

📒 Files selected for processing (3)
  • bindata/network/frr-k8s/frr-k8s.yaml (3 hunks)
  • bindata/network/frr-k8s/node-status-cleaner.yaml (1 hunks)
  • bindata/network/frr-k8s/webhook.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/network/frr-k8s/node-status-cleaner.yaml
  • bindata/network/frr-k8s/webhook.yaml
  • bindata/network/frr-k8s/frr-k8s.yaml
🪛 YAMLlint (1.37.1)
bindata/network/frr-k8s/frr-k8s.yaml

[error] 179-179: syntax error: expected , but found '-'

(syntax)

🔇 Additional comments (4)
bindata/network/frr-k8s/frr-k8s.yaml (2)

162-163: Verify readOnlyRootFilesystem=true compatibility with FRR runtime behavior.

The frr container now has readOnlyRootFilesystem: true added alongside the new emptyDir volumes for /var/lib/frr and /var/tmp/frr. Confirm that FRR daemon can successfully initialize and write all necessary state/logs to these temporary volumes and doesn't require write access to other root filesystem paths.

If the frr daemon requires write access to paths outside the mounted emptyDirs (e.g., /etc/frr), the readOnlyRootFilesystem setting may need adjustment or additional volume mounts.


48-51: Security hardening additions approved for frr container.

The addition of allowPrivilegeEscalation: false and readOnlyRootFilesystem: true, combined with emptyDir volumes for scratch space at /var/lib/frr and /var/tmp/frr, strengthens the security posture of the FRR daemon container. This aligns well with the PR's security hardening objectives.

Also applies to: 162-163

bindata/network/frr-k8s/webhook.yaml (1)

14-14: Service selector correctly routes to status cleaner deployment.

The selector change from frr-k8s-webhook-server to frr-k8s-statuscleaner aligns with the architectural refactoring to consolidate webhook serving into the status cleaner component. Ensure that the status cleaner deployment in node-status-cleaner.yaml includes the label component: frr-k8s-statuscleaner in its pod template to match this selector.

bindata/network/frr-k8s/node-status-cleaner.yaml (1)

1-82: Webhook configuration and security posture review.

The status cleaner deployment is appropriately configured for webhook serving:

  • HTTPS probes at /healthz with proper initial delays and timeouts (lines 48-62)
  • TLS certificate mounting from secret at /tmp/k8s-webhook-server/serving-certs (lines 75-78)
  • --disable-cert-rotation=true flag (line 29) is appropriate since certificate lifecycle is managed externally via the secret
  • Tolerations for control-plane nodes (lines 68-73) enable webhook availability on cluster-critical nodes
  • privileged SCC annotation (line 21) matches the frr-k8s DaemonSet requirement

Resource requests are minimal but appropriate for a webhook component (10m CPU, 50Mi memory).

@fedepaol fedepaol force-pushed the alignfrrk8supstream branch 2 times, most recently from 57efca2 to ae4e0e0 Compare November 10, 2025 15:15
The status cleaner pod is in charge of removing stale frrnodestates
resources from those nodes where frrk8s is not running anymore.

At the same time, it acts as the backend for the validation webhook, so
the configuration of the webhook now refers to it.

Signed-off-by: Federico Paolinelli <[email protected]>
readOnlyRootFilesystem prevents containers from writing to the root filesystem,
reducing attack surface and improving security posture by limiting potential
malicious file modifications and ensuring immutable container runtime.

allowPrivilegeEscalation=false prevents containers from gaining additional
privileges beyond those initially granted, further hardening the security
posture by blocking privilege escalation attacks.

Signed-off-by: Federico Paolinelli <[email protected]>
@fedepaol fedepaol force-pushed the alignfrrk8supstream branch from ae4e0e0 to 5872e43 Compare November 11, 2025 14:15
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 11, 2025

@fedepaol: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 5872e43 link false /test security
ci/prow/4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade 5872e43 link false /test 4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp 5872e43 link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec 5872e43 link true /test e2e-metal-ipi-ovn-ipv6-ipsec
ci/prow/e2e-azure-ovn-upgrade 5872e43 link true /test e2e-azure-ovn-upgrade
ci/prow/frrk8s-e2e 5872e43 link false /test frrk8s-e2e
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw 5872e43 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade 5872e43 link false /test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade
ci/prow/4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade 5872e43 link false /test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade
ci/prow/hypershift-e2e-aks 5872e43 link true /test hypershift-e2e-aks
ci/prow/e2e-aws-ovn-upgrade 5872e43 link true /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant