-
Notifications
You must be signed in to change notification settings - Fork 31
topology-aware: consider burstable CPU limit when picking a pool #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topology-aware: consider burstable CPU limit when picking a pool #570
Conversation
97af7d0 to
21943c5
Compare
21943c5 to
967f30c
Compare
|
@askervin @fmuyassarov PTAL |
93af3c6 to
51d20ba
Compare
Add topology-aware configuration option for which topology level we try to allocate containers with unlimited burstability at. To allow sharing the existing topology level configuration str-enum move it from policy/balloons to policy and reuse it in topology- aware as well. Signed-off-by: Krisztian Litkey <[email protected]>
06b6230 to
e2f3b4c
Compare
When picking a pool for a burstable QoS class container, take into account the CPU limit, too. For limited containers prefer pools with enough free capacity for the limit. For unlimited containers prefer pools with more free capacity left, which means in practice that such containers will typically end up in the root pool, unless affinity of topology hints dictate otherwise. Signed-off-by: Krisztian Litkey <[email protected]>
Update existing burstable placement test for altered behavior. Add new test to verify CPU limit based burstable placement of limited and unlimited burstable containers. Signed-off-by: Krisztian Litkey <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
e2f3b4c to
192c194
Compare
|
@askervin @fmuyassarov I added configurability for what unlimited burstability means. There is a global configuration option and a per-pod and container annotation to override the global default. I think this would be now ready for review. |
askervin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, very nice, thanks @klihub!
The last thing I was left pondering is whether or not the documentation should state what is the default for the default unlimitedBurstable option when it is not explicitly given. On one hand, I would think it would be valuable to document that by default burstable containers without CPU limit will be limited to the socket level, but on the other hand, none of the options in "Configuring the Policy" section tell what's the default when omitted. Making this an exception would not look too good, either.
That's why I'd think this PR is good as is, and documenting the defaults for this and other options, if necessary, would be a topic for separate PR(s).
I was also thinking about adding a mention of the (current) default to the documentation, because it makes sense to document it somewhere. The counter-argument for putting it into the documentation would be that if we ever to decide changing it then there is a danger of stale/incorrect documentation, if we forget to update them. Now the default is both set and 'documented' in the Custom Resource Definition generated for configuration, in one place so there is less danger of it going out of sync. But this is so iffy, that I almost added it to the docs. And I still don't know which one would really make more sense. Maybe a separate chapter listing all the configuration defaults, which we could also then link from the Helm chart (docs) which is the primary way of consuming the plugins. |
What would you think about a solution somewhere in the middle ground, that is, "make generate" or something would actually generate |
If we want to document all the configuration defaults in one place, I think we would need something like that. I'm just not sure if there is enough existing tooling to do it without actually installing the CRD on a cluster, creating a CR instance, then reading out how it came out to be... |
This PR improves CPU resource allocation for burstable QoS class containers.
Currently CPU burstability is not taken into account when a pool is picked for a container. For containers with high burstability (large CPU limit - CPU request), this can result in an assignment which cannot have any or enough excess CPU capacity for bursting.
This PR changes that behavior. When picking a pool for a burstable QoS class container, take into account the CPU limit, too. For limited burstable containers, prefer pools with enough free capacity for the limit. For unlimited burstable containers, prefer pools at a topology level indicated by the new
unlimitedBurstableconfiguration option, with a default ofpackageindicating sockets. This can be overridden on a per pod or container level using the newunlimited-burstable.resource-policy.nri.ioannotation. Unlimited containers will typically end up in a pool at the configured or annotated topology level, unless affinity, memory types, or topology hints dictate otherwise.