Skip to content

wip: Add EC2 metadata-labeler metadata source #2591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

sylviah23
Copy link

What type of PR is this?

/kind feature

What is this PR about? / Why do we need it?

This PR gets the Volume and ENI count of each node via the EC2 API. It then adds this information to the metadata labels of each node in the cluster, so Kubernetes can get this information without IMDS access.

How was this change tested?

Unit testing and E2E testing and manually testing with EKS cluster by adding a new node.

Does this PR introduce a user-facing change?

Add new metadata source relying on EC2.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 31, 2025
Copy link

linux-foundation-easycla bot commented Jul 31, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jul 31, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jsafrane for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @sylviah23. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 31, 2025
@AndrewSirenko
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 31, 2025
Copy link

github-actions bot commented Jul 31, 2025

Code Coverage Diff

File Old Coverage New Coverage Delta
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/cloud.go 87.4% 84.9% -2.5
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata/k8s.go 58.3% 72.3% 14.0
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata/labels.go Does not exist 57.6%
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/metadata/metadata.go 84.8% 86.5% 1.8

@mdzraf
Copy link
Member

mdzraf commented Jul 31, 2025

/test pull-aws-ebs-csi-driver-image-test

@mdzraf
Copy link
Member

mdzraf commented Aug 1, 2025

/test pull-aws-ebs-csi-driver-image-test

Fixed issue with ECR repo being full.

Copy link
Contributor

@ConnorJC3 ConnorJC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review round: I have not manually tested yet, intend to do that in my next review. I also don't think I was as in depth with looking at some parts (in particular the tests) as I should have been, so I might have missed some bugs lurking there.

The comments about the leader election are probably the most critical comments in this review - as written, this code will keep running after the leader election is switched away, causing two or more instances of the driver to simultaneously run the patcher and race each other.

Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution and adding comprehensive code coverage on the first revision @sylviah23 . Overall, this is looking solid 🙂.

The leader election code needs some work, as Connor suggested #2591 (comment), leveraging the CSI lib-utils wrapper will be a nice improvement. That code is battle tested and already addresses some of the core challenges we need to solve implementing this enhancement.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2025
@AndrewSirenko
Copy link
Contributor

Ran pre_allocated and karpenter scale tests with 2500 PVCs (therefore 25-40 nodes) and setting metadataSources=[ec2labelskubernetes]. Everything worked and proper allocatable counts were there!

Before moving to GA we should try suddenly installing driver on a XXL node cluster to see if anything chokes.

Copy link
Contributor

@ConnorJC3 ConnorJC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we need to have a naming discussion, especially re the metadata source name and the patcher sidecar name

TEST_PASSED=$?
set -e
set +x
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is very duplicated, we could probably combine it?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 13, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 14, 2025
@AndrewSirenko
Copy link
Contributor

I still think we need to have a naming discussion, especially re the metadata source name and the patcher sidecar name

Final names: metadata-labeler for both metadata source and new container.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2025
@AndrewSirenko AndrewSirenko changed the title add ec2 metadata source Add EC2 metadata-labeler metadata source Aug 15, 2025
@sylviah23
Copy link
Author

@AndrewSirenko LabelRefreshTime in cmd/main.go should be changed to 60 * time.Minute

@AndrewSirenko AndrewSirenko changed the title Add EC2 metadata-labeler metadata source wip: Add EC2 metadata-labeler metadata source Aug 19, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 19, 2025
@AndrewSirenko AndrewSirenko force-pushed the patcher branch 2 times, most recently from 3cf55ea to 3d2fb62 Compare August 19, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants