-
Notifications
You must be signed in to change notification settings - Fork 1.2k
🐛 Don't block on Get when queue is shutdown (2nd try) #3337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Don't block on Get when queue is shutdown (2nd try) #3337
Conversation
item := <-w.get | ||
|
||
return item.Key, item.Priority, w.shutdown.Load() | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from #3332 (comment)
This was way to tricky to debug :)
Once PQ was enabled per default, this test started to fail:
Describe("controller", func() { |
- manager stop failed with "failed waiting for all runnable to end within grace period"
- because not all workers of the leader election runnable group stopped
- and they didn't stop because they were stuck in GetWithPriority()
Just tricky to figure out:
- which test even caused this (I just got an apiserver stop timeout, the test itself did not fail)
- which runnable group did not stop
- which runnable did not stop
- why does it not stop (is reconcile blocking or something else)
I guess our logging could use some improvements ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, lets please do that, might also be worth to backport this. I don't fully understand it though I think, at this point we would've given them an item anyways, what difference does it make? Definitely worth go-docing that and i think it might be worth a test as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely worth go-docing that and i think it might be worth a test as well
Extended the godoc and added a unit test
I don't fully understand it though I think, at this point we would've given them an item anyways, what difference does it make?
We could give the caller only an item if we still have items in the queue, otherwise the caller is deadlocked.
Example:
- controller is started
- controller workers are calling
GetWithPriority
and are waiting for items - queue is empty and stays empty
- controller (and accordingly queue) is shutdown
- previous code:
- Shutdown() closes w.done
- workers remain blocked in l.293
- accordingly controller and runnable group cannot shutdown
- new code:
- Shutdown() closes w.done
- l.301 returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misread the code the last time and thought this was where we send, not where we get 🤦 Makes total sense
Signed-off-by: Stefan Büringer [email protected]
1a94201
to
5e1233d
Compare
LGTM label has been added. Git tree hash: 35717109c3ba177b710ffbbee40ef27626ceb374
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherrypick release-0.22 |
@alvaroaleman: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@alvaroaleman: new pull request created: #3338 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Stefan Büringer [email protected]
Part of #2374