- 
                Notifications
    
You must be signed in to change notification settings  - Fork 37
 
plugin: Add new reconcile metrics #1360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e4d8f3f    to
    02c4f80      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some naming suggestions, otherwise LGTM.
e8be52f    to
    c561790      
    Compare
  
    Implemented with a new WaitingTimer abstraction, which we use to track the aggregate amount of time where the queue is non-empty. The change to .golangci.yml is in order to exempt prometheus.Opts from exhaustruct requirements.
02c4f80    to
    d55af20      
    Compare
  
    | 
           Realized this doesn't handle objects being retried with a delay correctly (if still waiting, they'd be incorrectly counted towards the queue being non-empty when technically they're not ready to be used yet). Need to fix that before merging.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment from Em
There's two metrics added here:
Number of reconcile workers
This mirrors the existing
controller_runtime_max_concurrent_reconcilesmetric we have from neonvm-controller, which allows us to determine the fraction of total worker-time that we're using (a measure of saturation).Total duration with items in the queue
This is roughly analogous to the Linux kernel's CPU PSI metric — other metrics of saturation (using total time spent reconciling or total time in the queue) are useful, but they tend to be easy to misinterpret when the amount of saturation is very skewed across the duration between metric samples. So the idea here is to help get a more accurate picture.
Resolves neondatabase/cloud#27613.