Skip to content

Conversation

@jsafrane
Copy link
Contributor

@jsafrane jsafrane commented Sep 25, 2025

AWS EBS CSI driver should support MutableCSINodeAllocatableCount feature.

When the feature gate is enabled:

  • Set CSIDriver nodeAllocatableUpdatePeriodSeconds to 10 minutes, so kubelet knows it should periodically refresh attachment count from the driver.
  • Enable the feature gate in the external-attacher to propagate attach error to the scheduler / KCM.

To be tested together with openshift/api#2502 and after rebase to 1.34. Testing with 1.33 won't harm, but then you need to retest after the rebase...

cc @openshift/storage

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 25, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 25, 2025

@jsafrane: This pull request references STOR-2627 which is a valid jira issue.

In response to this:

AWS EBS CSI driver should support MutableCSINodeAllocatableCount feature.

Set CSIDriver nodeAllocatableUpdatePeriodSeconds to 10 minutes when the feature gate is enabled. Kubelet will then call NodeGetDriverInfo every 10 minutes to update the attach limit of EBS volumes.

cc @openshift/storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 25, 2025

@jsafrane: This pull request references STOR-2627 which is a valid jira issue.

In response to this:

AWS EBS CSI driver should support MutableCSINodeAllocatableCount feature.

Set CSIDriver nodeAllocatableUpdatePeriodSeconds to 10 minutes when the feature gate is enabled. Kubelet will then call NodeGetDriverInfo every 10 minutes to update the attach limit of EBS volumes.

To be tested together with openshift/api#2502 and after rebase to 1.34. Testing with 1.33 won't harm, but then you need to retest after the rebase...

cc @openshift/storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2025
@jsafrane jsafrane force-pushed the add-node-allocatable-sync-interval branch from a8b38c3 to d741e5c Compare September 25, 2025 12:01
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 25, 2025

@jsafrane: This pull request references STOR-2627 which is a valid jira issue.

In response to this:

AWS EBS CSI driver should support MutableCSINodeAllocatableCount feature.

When the feature gate is enabled:

  • Set CSIDriver nodeAllocatableUpdatePeriodSeconds to 10 minutes, so kubelet knows it should periodically refresh attachment count from the driver.
  • Enable the feature gate in the external-attacher to propagate attach error to the scheduler / KCM.

To be tested together with openshift/api#2502 and after rebase to 1.34. Testing with 1.33 won't harm, but then you need to retest after the rebase...

cc @openshift/storage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jsafrane jsafrane force-pushed the add-node-allocatable-sync-interval branch 4 times, most recently from ca7b756 to 960a598 Compare September 30, 2025 14:02
@Phaow
Copy link
Contributor

Phaow commented Oct 22, 2025

/test e2e-aws-csi

cfg.ExtraControlPlaneControllers = append(cfg.ExtraControlPlaneControllers, ctrl)
}

cfg.ExtraReplacementsFunc = func() []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jsafrane , I did some tests for enable nodeAllocatableUpdatePeriodSeconds in ebs csi driver and found that we still needs openshift/csi-external-attacher#89 which contains(kubernetes-csi/external-attacher@462cd54 we needed for MutableCSINodeAllocatableCount featuregate args in external attacher). In addition it seems the cfg.ExtraReplacementsFunc only do the replace the first time in cluster install when I enable the TechPreviewNoUpgrade it does not update the csidriver, I needs to delete the csidriver trigger the resource reconcile and update it with nodeAllocatableUpdatePeriodSeconds: 600. This should be an issue when upgrade/enable featuregate. Could you help check when you get a chance? Thank you! ^^

# After enable the TechPreviewNoUpgrade the csidriver does not update
$ oc get csidriver ebs.csi.aws.com -oyaml
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  annotations:
    csi.openshift.io/managed: "true"
    operator.openshift.io/spec-hash: d37d616b853448c2d019d19e3987ebfb751f3da09de52be009406f4d8f8fd7d6
  creationTimestamp: "2025-10-23T06:09:33Z"
  name: ebs.csi.aws.com
  resourceVersion: "86418"
  uid: 8ac66b43-1f14-4be7-9d4b-1e65f29ed14a
spec:
  attachRequired: true
  fsGroupPolicy: File
  podInfoOnMount: false
  requiresRepublish: false
  seLinuxMount: true
  storageCapacity: false
  volumeLifecycleModes:
  - Persistent

# delete the csidriver trigger reconcile
$ oc delete csidriver ebs.csi.aws.com
csidriver.storage.k8s.io "ebs.csi.aws.com" deleted
$ oc get csidriver ebs.csi.aws.com -oyaml
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  annotations:
    csi.openshift.io/managed: "true"
    operator.openshift.io/spec-hash: d37d616b853448c2d019d19e3987ebfb751f3da09de52be009406f4d8f8fd7d6
  creationTimestamp: "2025-10-23T07:01:02Z"
  name: ebs.csi.aws.com
  resourceVersion: "116117"
  uid: 84d59f30-7b5f-4059-9cb5-a02089c2093b
spec:
  attachRequired: true
  fsGroupPolicy: File
  nodeAllocatableUpdatePeriodSeconds: 600
  podInfoOnMount: false
  requiresRepublish: false
  seLinuxMount: true
  storageCapacity: false
  volumeLifecycleModes:
  - Persistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, the external-attacher rebase is necessary too.

it seems the cfg.ExtraReplacementsFunc only do the replace the first time in cluster install when I enable the TechPreviewNoUpgrade it does not update the csidriver

I'll look into this. The operator should restart when something changes FeatureGates...

AWS EBS CSI driver should support MutableCSINodeAllocatableCount feature.

Set CSIDriver nodeAllocatableUpdatePeriodSeconds to 10 minutes when the
feature gate is enabled.

Kubelet will then call NodeGetDriverInfo every 10 minutes to update the
attach limit of EBS volumes.
@jsafrane jsafrane force-pushed the add-node-allocatable-sync-interval branch 2 times, most recently from 03dd4d7 to 0c680e0 Compare October 31, 2025 08:54
@jsafrane jsafrane force-pushed the add-node-allocatable-sync-interval branch from 0c680e0 to 708693b Compare October 31, 2025 16:32
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 31, 2025

@jsafrane: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/smb-operator-e2e 960a598 link false /test smb-operator-e2e
ci/prow/e2e-azure-csi 708693b link true /test e2e-azure-csi
ci/prow/okd-scos-e2e-aws-ovn 708693b link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-azure 708693b link true /test e2e-azure
ci/prow/verify-deps 708693b link true /test verify-deps
ci/prow/hypershift-aws-e2e-external 708693b link true /test hypershift-aws-e2e-external
ci/prow/e2e-azure-ovn-upgrade 708693b link true /test e2e-azure-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jsafrane
Copy link
Contributor Author

jsafrane commented Nov 3, 2025

/hold
for openshift/library-go#2044
That will re-apply CSIDriver with nodeAllocatableUpdatePeriodSeconds cleared by the API server.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants