-
Notifications
You must be signed in to change notification settings - Fork 840
Add Readiness/Liveness EC2 API call probe to Controller Service #2590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Code Coverage DiffThis PR does not change the code coverage |
/retest |
Nitpick: Can we add "Fixes #1551" to the PR description? Also this isn't a breaking change, that should be removed from the changelog entry. |
77e6ea6
to
3d5f467
Compare
Can we switch to warning during changelog? I fear some customers with misconfigured drivers will be very concerned for a second after upgrading 😓 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
3d5f467
to
35d1a57
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
35d1a57
to
7f4de30
Compare
/hold holding for release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lgtm, one question: Have we tested what this looks like for a timeout scenario? Do we need to raise this value to avoid the dry run cutting out?
Have tested for timeout scenario (purposefull broke my policy AND also tested breaking networking) However good callout, I'll raise timeout to 10s for live and readiness, and then raise liveness failure threshold to 10 (so pod becomes not ready first, THEN starts restarting) |
7f4de30
to
5dd713b
Compare
/hold For final manual test |
What type of PR is this?
/kind feature
What is this PR about? / Why do we need it?
Lets customers realize that their EBS CSI Driver ebs-csi-controller pods aren't actually ready due to some Auth/networking error.
Should really help customer troubleshooting for new customers (and ensuring networking and auth issues aren't root cause of user problems)
Should work for other COs because this is done through CSI Probe RPC
Fixes #1551
How was this change tested?
Deploy driver without right IAM role, see not ready
Add IAM role, see ready
Remove role again, see driver eventually become not ready
Add role back and driver becomes ready again
Does this PR introduce a user-facing change?