Skip to content

Conversation

sbueringer
Copy link
Member

Signed-off-by: Stefan Büringer [email protected]

Part of #2374

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 5, 2025
item := <-w.get

return item.Key, item.Priority, w.shutdown.Load()
select {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from #3332 (comment)

This was way to tricky to debug :)

Once PQ was enabled per default, this test started to fail:

  • manager stop failed with "failed waiting for all runnable to end within grace period"
  • because not all workers of the leader election runnable group stopped
  • and they didn't stop because they were stuck in GetWithPriority()

Just tricky to figure out:

  1. which test even caused this (I just got an apiserver stop timeout, the test itself did not fail)
  2. which runnable group did not stop
  3. which runnable did not stop
  4. why does it not stop (is reconcile blocking or something else)

I guess our logging could use some improvements ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alvaroaleman:

Yeah, lets please do that, might also be worth to backport this. I don't fully understand it though I think, at this point we would've given them an item anyways, what difference does it make? Definitely worth go-docing that and i think it might be worth a test as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely worth go-docing that and i think it might be worth a test as well

Extended the godoc and added a unit test

I don't fully understand it though I think, at this point we would've given them an item anyways, what difference does it make?

We could give the caller only an item if we still have items in the queue, otherwise the caller is deadlocked.

Example:

  • controller is started
  • controller workers are calling GetWithPriority and are waiting for items
  • queue is empty and stays empty
  • controller (and accordingly queue) is shutdown
  • previous code:
    • Shutdown() closes w.done
    • workers remain blocked in l.293
    • accordingly controller and runnable group cannot shutdown
  • new code:
    • Shutdown() closes w.done
    • l.301 returns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread the code the last time and thought this was where we send, not where we get 🤦 Makes total sense

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 35717109c3ba177b710ffbbee40ef27626ceb374

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alvaroaleman
Copy link
Member

/cherrypick release-0.22

@k8s-infra-cherrypick-robot

@alvaroaleman: once the present PR merges, I will cherry-pick it on top of release-0.22 in a new PR and assign it to you.

In response to this:

/cherrypick release-0.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot merged commit bcb29e7 into kubernetes-sigs:main Oct 5, 2025
9 checks passed
@k8s-infra-cherrypick-robot

@alvaroaleman: new pull request created: #3338

In response to this:

/cherrypick release-0.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants